Tutorial: Learning and Inference in Markov Logic Networks¶
This tutorial will explain how to learn the parameters of a Markov
logic network from a training database and how to use to resulting
model to answer queries. We will make use of the well-known
smoking scenario
as used by Richardson and Domingos.
We work in the examples/smokers
directory.
The Smoking Scenario¶
The Smoking scenario models the dependencies between smoking and having cancer. Moreover, we consider the social network induced by a ‘’friends’’ relation. We thus define the following predicates:
Smokes(person)
Cancer(person)
Friends(person,person)
and define a rule stating that cancer follows from smoking,:
Smokes(p) => Cancer(p)
and further model the symmetry of the friendship relation and its influence on smoking habits:
Friends(p1,p2) <=> Friends (p2,p2)
Friends(p1,p2) => (Smokes(p1) <=> Smokes(p2))
The last rule states that if two persons are friends, they either both smoke or both do not smoke.
Learning¶
We use the MLN-Learning-Tool to learn the weights of the Markov logic network.:
cd /path/to/probcog/examples/smokers
mlnlearn
We pick the “PRACMLN” engine and the MLN defined above (i.e.
smoking.mln
). The MLN is displayed in the editor. To use
the internal engine one has to add a 0.0
-weight to all formulas
what are not terminated by a period as a 0.0
-weight is not
implicitly assumed for formulas without weights when using the
internal engine.
Next, we choose the desired training database (e.g.
smoking-train.db
) and learning method (e.g.
‘’pseudo-log-likelihood with blocking’‘).
Having made our selections, we start the learning process by clicking the ‘’Learn’’ button at the bottom of the dialog, which gives us weights, e.g.:
1.126769 Smokes(x) => Cancer(x)
1.577776 Friends(x, y) => (Smokes(x) <=> Smokes(y))
The resulting MLN is saved to the filename we entered under ‘’Output filename’‘.
Inference¶
We now invoke the MLN Query-Tool from the console.:
mlnquery
To test the model described and trained above, we consider the following evidence database::
Cancer(Ann)
!Cancer(Bob)
!Friends(Ann,Bob)
Using this evidence, we want to infer the smoking habits of Ann and
Bob: Our queries include Smokes
, Smokes(Ann) v
Smokes(Bob)
, Smokes(Ann)
and Smokes(Bob)
.
For this small evidence database, we can still use exact inference.
We got the following results::
0.436830 Smokes(Ann)
0.152667 Smokes(Ann) ^ Smokes(Bob)
0.528921 Smokes(Ann) v Smokes(Bob)
0.244758 Smokes(Bob)
As expected, it is more likely for Ann to smoke than for Bob.
Exact inference can also give us the full distribution over possible
worlds which we obtain by using debug=True
as an
additional parameter. The first 3 of 256 possible worlds::
1 0.81% Friends(Ann,Ann) Friends(Ann,Bob) Friends(Bob,Ann) Friends(Bob,Bob)
Smokes(Ann) Smokes(Bob) Cancer(Ann) Cancer(Bob)
5.242963e+03 <- 8.56 <- 1.1 1.1 1.6 1.6 1.6 1.6
2 0.26% Friends(Ann,Ann) Friends(Ann,Bob) Friends(Bob,Ann) Friends(Bob,Bob)
Smokes(Ann) Smokes(Bob) Cancer(Ann) !Cancer(Bob)
1.699132e+03 <- 7.44 <- 1.1 1.6 1.6 1.6 1.6
3 0.26% Friends(Ann,Ann) Friends(Ann,Bob) Friends(Bob,Ann) Friends(Bob,Bob)
Smokes(Ann) Smokes(Bob) !Cancer(Ann) Cancer(Bob)
1.699132e+03 <- 7.44 <- 1.1 1.6 1.6 1.6 1.6
The end of each line (i.e. each third line in the results table above) contains the exponentiated sum of weights, the sum of weights and the individual weights that were summed (rounded values).
We can also use MC-SAT for this model and evidence databse. We set
the maximum number of steps to 5000, and we set SampleSAT’s p
parameter to 0.6
and control the intermediate output using the
additional parameters p=0.6, infoInterval=500,
resultsInterval=1000
. We obtain::
0.449800 Smokes(Ann)
0.146000 Smokes(Ann) ^ Smokes(Bob)
0.548800 Smokes(Ann) v Smokes(Bob)
0.245000 Smokes(Bob)
We observe that MC-Sat gives us reasonable approximations results (compared to the exact solution calculated above) for this evidence database.