Description
3 [6 pts] HMM: Search and Rescue
Problem 1 (40 points): You are an interplanetary search and rescue expert who has just received an urgent message: a rover You are an interplanetary search and rescue expert who _{has} just received an urgent message: a rover on _{Mercury} on Mercury has fallen and become trapped in Death Ravine, deep, narrow gorge on the borders of enemy territory. You has fallen and become trapped in Death Ravine, a deep, narrow gorge on the borders of enemy territory. You zoom zoom over to Mercury to investigate the situation. Death Ravine is a narrow gorge 6 miles long, as shown below. There are over to Mercury to investigate the situation. Death Ravine is a narrow gorge 6 miles long, as shown below. There
volcanic vents at locations A and D, indicated by the triangular symbols at those locations.
are volcanic vents at locations A and D, indicated by the triangular symbols at those locations.
A B C D E F
! !
The rover was heavily damaged in the fall, and as a result, most of its sensors are broken. The only ones still func
The rover was heavily damaged in the fall, and as a result, most of its sensors are broken. The only ones still functioning tioning are its thermometers, which register only two levels: hot and cold. The rover sends back evidence E = hot
are its thermometers, which register only two levels: hot and cold. The rover sends back evidence E = hot when it is at a when it is at a volcanic vent (A and D), and E = cold otherwise. There is no chance of a mistaken reading. The rover
volcanic vent (A and D), and E = cold otherwise. There is no chance of a mistaken reading. The rover fell into the gorge at 
}. 

fell into the gorge at position A on day 1, so X_{1} = A. Let the rover’s position on day t be X_{t} ∈ {A, B, C, D, E, F 

position A on day 1, so X_{1} = A. Let the rover’s position on day t be X_{t} 
A; B; C; D; E; F . The rover is still executing 

The rover is still executing its original programming, trying to 
move 1 mile east (i.e. right, towards F) every day. 

2 f 
g 

its original programming, trying to move 1 mile east (i.e. right, towards F) every day. However, be 
of the damage, it 

However, because of the damage, it only moves east with probability 0.5, and it stays in placausewith probability 0.5. 

nly moves east with probability 0:80, and t stays in place with probabilit 
0:20. Your job is to figure out where the rover 

Your job is to figure out where the rover is, so that you can dispatch your rescuebot. 
is, so that you can dispatch your rescuebot.

(2 pt) Three days have passed since the rover fell into the ravine. The observations were (E_{1} = hot, E_{2} = cold, E_{3} = cold ). What is P (X_{3}hot_{1}, cold_{2}, cold_{3}), the probability distribution over the rover’s position on day 3, given the observations?

Filtering: Three days have passed since the rover fell into the ravine. The observations were (E_{1} = hot; E_{2} = cold; E_{3} = cold). What is P (X_{3} j hot_{1}; cold_{2}; cold_{3}), the probability distribution over the rover’s position on day 3, given the observations? (This is a probability distribution over the six possible positions).

Smoothing: What is P (X_{2} j hot_{1}; cold_{2}; cold_{3}), the probability distribution over the rover’s position on day 2, given the observations? (This is a probability distribution over the six possible positions).

Prediction: What is P (hot_{4} j hot_{1}; cold_{2}; cold_{3}), the probability of observing hot_{4} in day 4 given the previous observations in days 1,2, and 3? (This is a single value, not a distribution).

Prediction: You decide to attempt to rescue the rover on day 4. However, the transmission of E_{4} seems to have been corrupted, and so it is not observed. What is the rover’s position distribution for day 4 given the same evidence, P (X_{4} j hot_{1}; cold_{2}; cold_{3})?

Bonus Question for Extra Credit: What is P (hot_{4}; hot_{5}; cold_{6} j hot_{1}; cold_{2}; cold_{3}), the probability of observing hot_{4} and hot_{5} and cold_{6} in days 4,5,6 respectively, given the previous observations in days 1,2, and 3? (This is a single value, not a distribution). Note: The answer is a little bit too long, I don’t recommend answering this question unless you have already answered it. The max score for HW3, including all extra credits (LaTeX and this question), is 110.
You need to apply the formulas covered in the lecture to get to the answers. Do not just guess the answer using your logic. There are a lot of calculations involved, but most of them result in zeros. Save your time, if a product of probabilities contains a probability that is 0, then just write 0 and do not write down the entire product.
Problem 2 (10 points): Consider the Markov Decision Process (MDP) with transition probabilities and reward function as given in the tables below. Assume the discount factor = 1 (i.e., there is no actual discounting).
s 
a 
_{s}0 
T (s; a; s^{0} ) 
A 
1 
A 
1 
A 
1 
B 
0 
A 
2 
A 
0:5 
A 
2 
B 
0:5 
s 
a 
s^{0} 
T (s; a; s^{0} ) 
B 
1 
A 
0 
B 
1 
B 
1 
B 
2 
A 
0 
B 
2 
B 
1 
s 
a 
R(s; a) 
A 
1 
0 
A 
2 
1 
s 
a 
R(s; a) 
B 
1 
5 
B 
2 
0 
We follow the steps of the Policy Iteration algorithm as explained in the class.

Write down the Bellman equation.

The initial policy is (A) = 1 and (B) = 1. That means that action 1 is taken when in state A, and the same action is taken when in state B as well. Calculate the values V_{2} (A) and V_{2} (B) from two iterations of policy evaluation (Bellman equation) after initializing both V_{0} (A) and V_{0} (B) to 0.

Find an improved policy _{new} based on the calculated values V_{2} (A) and V_{2} (B).