Tutorial: Learning and Inference in Markov Logic Networks

This tutorial will explain how to learn the parameters of a Markov logic network from a training database and how to use to resulting model to answer queries. We will make use of the well-known smoking scenario as used by Richardson and Domingos.

We work in the examples/smokers directory.

The Smoking Scenario

The Smoking scenario models the dependencies between smoking and having cancer. Moreover, we consider the social network induced by a ‘’friends’’ relation. We thus define the following predicates:

Smokes(person)
Cancer(person)
Friends(person,person)

and define a rule stating that cancer follows from smoking,:

Smokes(p) => Cancer(p)

and further model the symmetry of the friendship relation and its influence on smoking habits:

Friends(p1,p2) <=> Friends (p2,p2)
Friends(p1,p2) => (Smokes(p1) <=> Smokes(p2))

The last rule states that if two persons are friends, they either both smoke or both do not smoke.

Learning

We use the MLN-Learning-Tool to learn the weights of the Markov logic network.:

cd /path/to/probcog/examples/smokers
mlnlearn

We pick the “PRACMLN” engine and the MLN defined above (i.e. smoking.mln). The MLN is displayed in the editor. To use the internal engine one has to add a 0.0-weight to all formulas what are not terminated by a period as a 0.0-weight is not implicitly assumed for formulas without weights when using the internal engine.

Next, we choose the desired training database (e.g. smoking-train.db) and learning method (e.g. ‘’pseudo-log-likelihood with blocking’‘).

Having made our selections, we start the learning process by clicking the ‘’Learn’’ button at the bottom of the dialog, which gives us weights, e.g.:

1.126769  Smokes(x) => Cancer(x)
1.577776  Friends(x, y) => (Smokes(x) <=> Smokes(y))

The resulting MLN is saved to the filename we entered under ‘’Output filename’‘.

Inference

We now invoke the MLN Query-Tool from the console.:

mlnquery

To test the model described and trained above, we consider the following evidence database::

Cancer(Ann)
!Cancer(Bob)
!Friends(Ann,Bob)

Using this evidence, we want to infer the smoking habits of Ann and Bob: Our queries include Smokes, Smokes(Ann) v Smokes(Bob), Smokes(Ann) and Smokes(Bob). For this small evidence database, we can still use exact inference. We got the following results::

0.436830  Smokes(Ann)
0.152667  Smokes(Ann) ^ Smokes(Bob)
0.528921  Smokes(Ann) v Smokes(Bob)
0.244758  Smokes(Bob)

As expected, it is more likely for Ann to smoke than for Bob.

Exact inference can also give us the full distribution over possible worlds which we obtain by using debug=True as an additional parameter. The first 3 of 256 possible worlds::

1   0.81%   Friends(Ann,Ann)  Friends(Ann,Bob)  Friends(Bob,Ann)  Friends(Bob,Bob)
            Smokes(Ann)  Smokes(Bob)  Cancer(Ann)   Cancer(Bob)
            5.242963e+03 <- 8.56 <- 1.1 1.1 1.6 1.6 1.6 1.6
2   0.26%   Friends(Ann,Ann)  Friends(Ann,Bob)  Friends(Bob,Ann)  Friends(Bob,Bob)
            Smokes(Ann)  Smokes(Bob)  Cancer(Ann)   !Cancer(Bob)
            1.699132e+03 <- 7.44 <- 1.1 1.6 1.6 1.6 1.6
3   0.26%   Friends(Ann,Ann)  Friends(Ann,Bob)  Friends(Bob,Ann)  Friends(Bob,Bob)
            Smokes(Ann)  Smokes(Bob) !Cancer(Ann)   Cancer(Bob)
            1.699132e+03 <- 7.44 <- 1.1 1.6 1.6 1.6 1.6

The end of each line (i.e. each third line in the results table above) contains the exponentiated sum of weights, the sum of weights and the individual weights that were summed (rounded values).

We can also use MC-SAT for this model and evidence databse. We set the maximum number of steps to 5000, and we set SampleSAT’s p parameter to 0.6 and control the intermediate output using the additional parameters p=0.6, infoInterval=500, resultsInterval=1000. We obtain::

0.449800  Smokes(Ann)
0.146000  Smokes(Ann) ^ Smokes(Bob)
0.548800  Smokes(Ann) v Smokes(Bob)
0.245000  Smokes(Bob)

We observe that MC-Sat gives us reasonable approximations results (compared to the exact solution calculated above) for this evidence database.