Tutorial: Learning and Inference in Markov Logic Networks
=========================================================

This tutorial will explain how to learn the parameters of a Markov 
logic network from a training database and how to use to resulting 
model to answer queries. We will make use of the well-known 
``smoking scenario`` as used by Richardson and Domingos.

We work in the ``examples/smokers`` directory.

The Smoking Scenario
^^^^^^^^^^^^^^^^^^^^

The Smoking scenario models the dependencies between smoking and 
having cancer. Moreover, we consider the social network induced by 
a ''friends'' relation. We thus define the following predicates::

    Smokes(person)
    Cancer(person)
    Friends(person,person)

and define a rule stating that cancer follows from smoking,::

    Smokes(p) => Cancer(p)

and further model the symmetry of the friendship relation and its influence on smoking habits::
  
    Friends(p1,p2) <=> Friends (p2,p2)
    Friends(p1,p2) => (Smokes(p1) <=> Smokes(p2))

The last rule states that if two persons are friends, they either 
both smoke or both do not smoke.

Learning
^^^^^^^^

We use the :doc:`mlnlearningtool` to learn the weights of the Markov logic network.::

    cd /path/to/probcog/examples/smokers
    mlnlearn

We pick the "PRACMLN" engine and the MLN defined above (i.e. 
``smoking.mln``). The MLN is displayed in the editor. To use 
the internal engine one has to add a ``0.0``-weight to all formulas 
what are not terminated by a period as a ``0.0``-weight is not 
implicitly assumed for formulas without weights when using the 
internal engine. 

Next, we choose the desired training database (e.g. 
``smoking-train.db``) and learning method (e.g. 
''pseudo-log-likelihood with blocking''). 


Having made our selections, we start the learning process by 
clicking the ''Learn'' button at the bottom of the dialog, which 
gives us weights, e.g.::

    1.126769  Smokes(x) => Cancer(x)
    1.577776  Friends(x, y) => (Smokes(x) <=> Smokes(y))

The resulting MLN is saved to the filename we entered under ''Output filename''.

Inference
^^^^^^^^^

We now invoke the :doc:`mlnquerytool` from the console.::

    mlnquery

To test the model described and trained above, we consider the following evidence database:::

    Cancer(Ann)
    !Cancer(Bob)
    !Friends(Ann,Bob)

Using this evidence, we want to infer the smoking habits of Ann and 
Bob: Our queries include ``Smokes``, ``Smokes(Ann) v 
Smokes(Bob)``, ``Smokes(Ann)`` and ``Smokes(Bob)``. 
For this small evidence database, we can still use exact inference. 
We got the following results:::

    0.436830  Smokes(Ann)
    0.152667  Smokes(Ann) ^ Smokes(Bob)
    0.528921  Smokes(Ann) v Smokes(Bob)
    0.244758  Smokes(Bob)

As expected, it is more likely for Ann to smoke than for Bob. 

Exact inference can also give us the full distribution over possible 
worlds which we obtain by using ``debug=True`` as an 
additional parameter. The first 3 of 256 possible worlds:::

    1   0.81%   Friends(Ann,Ann)  Friends(Ann,Bob)  Friends(Bob,Ann)  Friends(Bob,Bob)  
                Smokes(Ann)  Smokes(Bob)  Cancer(Ann)   Cancer(Bob)  
                5.242963e+03 <- 8.56 <- 1.1 1.1 1.6 1.6 1.6 1.6
    2   0.26%   Friends(Ann,Ann)  Friends(Ann,Bob)  Friends(Bob,Ann)  Friends(Bob,Bob)  
                Smokes(Ann)  Smokes(Bob)  Cancer(Ann)   !Cancer(Bob)  
                1.699132e+03 <- 7.44 <- 1.1 1.6 1.6 1.6 1.6
    3   0.26%   Friends(Ann,Ann)  Friends(Ann,Bob)  Friends(Bob,Ann)  Friends(Bob,Bob)
                Smokes(Ann)  Smokes(Bob) !Cancer(Ann)   Cancer(Bob)
                1.699132e+03 <- 7.44 <- 1.1 1.6 1.6 1.6 1.6

The end of each line (i.e. each third line in the results table 
above) contains the exponentiated sum of weights, the sum of 
weights and the individual weights that were summed (rounded values).

We can also use MC-SAT for this model and evidence databse. We set 
the maximum number of steps to 5000, and we set SampleSAT's p 
parameter to ``0.6`` and control the intermediate output using the 
additional parameters ``p=0.6, infoInterval=500, 
resultsInterval=1000``. We obtain:::

    0.449800  Smokes(Ann)
    0.146000  Smokes(Ann) ^ Smokes(Bob)
    0.548800  Smokes(Ann) v Smokes(Bob)
    0.245000  Smokes(Bob)

We observe that MC-Sat gives us reasonable approximations results 
(compared to the exact solution calculated above) for this evidence 
database.