Description
3.1 Inference in a chain
X_{1 } X_{2 } X_{3 } • • • X_{T2 } X_{T1 } X_{T}
Consider the belief network shown above with random variables X_{t} 2f1; 2; : : : ; mg. Suppose that the CPT at each nonroot node is given by the same m m matrix; that is, for all t 1, we have:
A_{ij} = P (X_{t+1} =jjX_{t} = i):

Prove that P (X_{t+1} = jjX_{1} = i) = [A^{t}]_{ij}, where A^{t} is the t^{th} power of the matrix A. Hint: use induction.

Consider the computational complexity of this inference. Devise a simple algorithm, based on matrixvector multiplication, that scales as O(m^{2}t).

Show alternatively that the inference can also be done in O(m^{3} log_{2} t).

Suppose that the transition matrix A_{ij} is sparse, with at most s m nonzero elements per row. Show that in this case the inference can be done in O(smt).

Show how to compute the posterior probability P (X_{1} = ijX_{T} _{+1} = j) in terms of the matrix A and the prior probability P (X_{1} = i). Hint: use Bayes rule and your answer from part (a).
3.2 More inference in a chain
X_{0} X_{1}
Consider the simple belief network shown to the right, with nodes X_{0}, X_{1}, and Y_{1}.
To compute the posterior probability P (X_{1}jY_{1}), we can use Bayes rule:

P(X Y
) =
P (Y_{1}jX_{1}) P (X_{1})
^{Y}1
1^{j} 1
P(Y_{1})
(a) Show how to compute the conditional probability P (Y_{1}jX_{1}) that appears in the numerator of Bayes rule from the CPTs of the belief network.
(b) Show how to compute the marginal probability P (Y_{1}) that appears in the denominator of Bayes rule from the CPTs of the belief network.
Next you will show how to generalize these simple computations when the basic structure of this DAG is repeated to form an extended chain. Like the previous problem, this is another instance of efficient inference in polytrees.
1

X_{0}
X_{1}
X_{2}
. . .
^{X}n1
X_{n}
Y_{1}
Y_{2}
. . .
^{Y}n1
Y_{n}
Consider how to efficiently compute the posterior probability P (X_{n}jY_{1}; Y_{2}; : : : ; Y_{n}) in the above belief network. One approach is to derive a recursion from the conditionalized form of Bayes rule
^{P} ^{(Y}nj^{X}n^{; Y}1^{; Y}2^{; : : : ; Y}n 1^{)} ^{P} ^{(X}nj^{Y}1^{; Y}2^{; : : : ; Y}n 1^{)}
_{P }_{(}_{X}_{n}_{j}_{Y}_{1}_{; Y}_{2}_{; : : : ; Y}_{n}_{) =}
where the nodes Y_{1}; Y_{2}; : : : ; Y_{n} _{1} are treated as background evidence. In this problem you will express the conditional probabilities on the right hand side of this equation in terms of the CPTs of the network and the probabilities P (X_{n} _{1} = xjY_{1}; Y_{2}; : : : ; Y_{n} _{1}), which you may assume have been computed at a previous step of the recursion. Your answers to (a) and (b) should be helpful here.

Simplify the term P (X_{n}jY_{1}; Y_{2}; : : : ; Y_{n} _{1}) that appears in the numerator of Bayes rule.

Show how to compute the conditional probability P (Y_{n}jX_{n}; Y_{1}; Y_{2}; : : : ; Y_{n} _{1}) that appears in the
numerator of Bayes rule. Express your answer in terms of the CPTs of the belief network and the probabilities P (X_{n} _{1} = xjY_{1}; Y_{2}; : : : ; Y_{n} _{1}), which you may assume have already been computed.

Show how to compute the conditional probability P (Y_{n}jY_{1}; Y_{2}; : : : ; Y_{n} _{1}) that appears in the denominator of Bayes rule. Express your answer in terms of the CPTs of the belief network and the
probabilities P (X_{n} _{1} = xjY_{1}; Y_{2}; : : : ; Y_{n} _{1}), which you may assume have already been computed.
3.3 Node clustering and polytrees
In the figure below, circle the DAGs that are polytrees. In the other DAGs, shade two nodes that could be clustered so that the resulting DAG is a polytree.
2
3.4 Cutsets and polytrees
Clearly not all problems of inference are intractable in loopy belief networks. As a trivial example, consider the query P (QjE_{1}; E_{2}) in the network shown below:
^{E}1
Q
^{E}2
In this case, because E_{1} and E_{2} are the parents of Q, the query P (QjE_{1}; E_{2}) can be answered directly from the conditional probability table at node Q.
As a less trivial example, consider how to compute the posterior probability P (QjE_{1}; E_{2}) in the belief network shown below:
E_{1}
Q
E_{2}
In this belief network, the posterior probability P (QjE_{1}; E_{2}) can be correctly computed by running the polytree algorithm on the subgraph of nodes that are enclosed by the dotted rectangle:
E_{1}
Q
E_{2}
In this example, the evidence nodes dseparate the query node from the loopy parts of the network. Thus for this inference the polytree algorithm would terminate before encountering any loops.
(continued)
3
For each of the five loopy belief networks shown below, consider how to compute the posterior probability P (QjE_{1}; E_{2}).
If the inference can be performed by running the polytree algorithm on a subgraph, enclose this subgraph by a dotted line as shown on the previous page. (The subgraph should be a polytree.)
On the other hand, if the inference cannot be performed in this way, shade one node in the belief network that can be instantiated to induce a polytree by the method of cutset conditioning.
^{ E}1
^{E}1
Q Q
E_{2}
E_{2}
E_{1 } Q E_{2}
E_{1 } Q E_{2}
E_{1 } Q E_{2}
4
3.5 Node clustering
Consider the belief network shown below over binary variables X, Y_{1}, Y_{2}, Y_{3}, Z_{1}, and Z_{2}. The network can be transformed into a polytree by clustering the nodes Y_{1}, Y_{2}, and Y_{3} into a single node Y . From the CPTs in the original belief network, fill in the missing elements of the CPTs for the polytree.
X 

X 
P (Y_{1} = 1jX) 
P (Y_{2} = 1jX) 
P (Y_{3} = 1jX) 

0 
0.75 
0.50 
0.25 

1 
0.50 
0.25 
0.75 

^{Y}1 
^{Y}2 
^{Y}3 

Y_{1} 
Y_{2} 
Y_{3} 
P (Z_{1} = 1jY_{1}; Y_{2}; Y_{3}) 
P (Z_{2} = 1jY_{1}; Y_{2}; Y_{3}) 

0 
0 
0 
0.9 
0.1 

1 
0 
0 
0.8 
0.2 

0 
1 
0 
0.7 
0.3 
Z_{1} 
Z_{2} 

0 
0 
1 
0.6 
0.4 

1 
1 
0 
0.5 
0.5 

1 
0 
1 
0.4 
0.6 

0 
1 
1 
0.3 
0.7 

1 
1 
1 
0.2 
0.8 
X 

Y_{1} 
Y_{2} 
Y_{3} 
Y 
P (Y jX = 0) 
P (Y jX = 1) 
P (Z_{1} = 1jY ) 
P (Z_{2} = 1jY ) 

0 
0 
0 
1 

1 
0 
0 
2 

0 
1 
0 
3 
Y 

0 
0 
1 
4 

1 
1 
0 
5 

1 
0 
1 
6 

0 
1 
1 
7 
Z_{1} 
Z_{2} 

1 
1 
1 
8 

5
3.6 Likelihood weighting
Consider the belief network shown below, with n binary random variables B_{i} 2f0; 1g and an integer random
variable Z. Let f(B) = 
n 
2 
i 
1 
B_{i} 
denote the nonnegative integer whose binary representation is given 

i=1 

by 
B B 
1 
:::B B 
Suppose that each bit has prior probability P (B = 1) = 
1 
, and that 

2 

n n 
2 1^{.} 
P 
i 

1 

P (ZjB_{1}; B_{2}; : : : ; B_{n}) = 
jZ f(B)j 

1 + 
where 0 < < 1 is a parameter measuring the amount of noise in the conversion from binary to decimal. (Larger values of indicate greater levels of noise.)
… bits B_{i}
integer Z
(a) Show that the conditional distribution for binary to decimal conversion is normalized; namely, that
^{P}_{z} P (Z = zjB_{1}; B_{2}; : : : ; B_{n}) = 1, where the sum is over all integers z 2 [ ; +1].

Consider a network with n = 10 bits and noise level = 0:1. Use the method of likelihood weighting to estimate the probability P (B_{i} = 1jZ = 128) for i 2 f2; 5; 8; 10g.

Plot your estimates in part (b) as a function of the number of samples. You should be confident from the plots that your estimates have converged to a good degree of precision (say, at least two significant digits).

Submit a hardcopy printout of your source code. You may program in the language of your choice, and you may use any program at your disposal to plot the results.
6
3.7 Even more inference
Show how to perform the desired inference in each of the belief networks shown below. Justify briefly each step in your calculations.

Markov blanket
Show how to compute the posterior probability P (BjA; C; D) in terms of the CPTs of this belief network—namely, P (A), P (BjA), P (C), and P (DjB; C).

Conditional independence
This belief network has conditional probability tables for P (F jA) and P (EjC) in addition to those of the previous problem. Assuming that all these tables are given, show how to compute the posterior probability
P (BjA; C; D; E; F ) in terms of these additional CPTs and your answer to part (a).

More conditional independence
Assuming that all the conditional probability tables in this belief network are given, show how to compute the posterior probability P (B; E; F jA; C; D). Express your answer in terms of the CPTs of the network, as well as your earlier answers for parts (a) and (b).
A B D
C
F
A B D
C E
F
A B D
C E
7
3.8 More likelihood weighting

Single node of evidence
X Q
Y Z
E
Suppose that T samples fq_{t}; x_{t}; y_{t}; z_{t}g^{T}_{t=1} are drawn from the CPTs of the belief network shown above (with fixed evidence E = e). Show how to estimate P (Q = qjE = e) from these samples using the method of likelihood weighting. Express your answer in terms of sums over indicator functions, such as:

_{I(q; q}0_{) =} ^{(}
1
if q = q
0
0
otherwise
In addition, all probabilities in your answer should be expressed in terms of CPTs of the belief network (i.e., probabilities that do not require any additional computation).

Multiple nodes of evidence
Suppose that T samples fq_{1t}; q_{2t}; x_{t}; y_{t}; z_{t}g^{T}_{t=1} are drawn from the CPTs of the network shown below (with fixed evidence E_{1} = e_{1} and E_{2} = e_{2}). Show how to estimate P (Q_{1} = q_{1}; Q_{2} = q_{2}jE_{1} = e_{1}; E_{2} = e_{2}) from these samples using the method of likelihood weighting. Express your answer in terms of indicator functions and CPTs of the belief network.
^{Q}_{1} X Y
^{E}_{1} Z
^{E}2 ^{Q}2
8