Description
Task: Work through each set of exercises. If you get stuck on any of the exercises you can ask Yi or myself for help by email or during o ce hours.
What to submit: Submit your answers for all of the exercises in this document to the appropriate dropbox on the Carmen site. Answers for the concept check and proof sections can be handwritten (e.g., submitted as a scanned image), but please make sure that your writing is readable. Answers to the coding section must be written in python and must be runnable by the grader.
Due date: Submit your answers to the Carmen dropbox by 11:59pm, Jun. 27th.
Concept check

(2pt) Using the alarm network on slide 3 of the Bayesian Inference slides, compute P(B j + j; +m).

(3pt) Refer to the Naive Bayes Classifier shown below. Suppose C has domain fc_{1}; c_{2}; c_{3}g and each X_{i} is a Boolean variable with values true and f alse. Using the Bayesian net G, compute the following distribution, showing the manner in which you derived your answer.
P(C j X_{1} = f alse; X_{2} = true; X_{3} = f alse):
3. (3pt) The sigmoid function
1
^{s(z)} ^{=} _{1} _{+} _{e} z
1

C :: P (c_{1})
P (c_{2})
G
X_{2} :: P (truec_{1})
P (truec_{2})
P (truec_{3})
C
0.3
0.5
0.9
0.5
0.7
X_{1} :: P (truec_{1})
P (truec_{2})
P (truec_{3})
X_{3} :: P (truec_{1})
P (truec_{2})
P (truec_{3})
0.7
0.4
0.2
^{X}2
0.6
0.4
0.2
^{X}1
^{X}3
Figure 1: A Naive Bayes Classifier.
has derivative s0(z) = s(z)(1 s(z)). Moreover, recall that during backpropogation the derivative s0(z) is a factor in the gradient computation used to update the weights of a multilayer perceptron (see slides 2830 in the neuralnets.pdf slide set). Activation functions like sigmoid have a “saturation” problem: when z is very large or very small, s(z) is close to 1 or 0, respectively, and so s0(z) is close to 0. As a result, corresponding gradients will be nearly 0, which slows down training. A ne activation functions with positive slope always have a positive derivative and thus will (more or less) not exhibit saturation, but they have other drawbacks (think back to lab 6). Do a little research and find a nona ne activation function that avoids the saturation problem (hint: ReLU). In your own words, describe how this activation is nona ne and also avoids the saturation problem. Briefly discuss any drawbacks your chosen activation function may have, as well as similar alternatives that avoid these drawbacks.
Coding

(8pt) Implement in python a convolutional layer (without identity activation) that computes the application of the 3×3 vertical and horizontal sobel masks below to an input image of size 5x5x3 with zeropadding of size 1. That is, the weights of your convolutional layer will not be learned, but rather hardcoded to match the values of the filters. To make things concrete, use the input volume and masks below:

3
1
3
8
2
5
4
1
3
8
2
2
3
7
3
4
1
5
7
9
4
9
1
4
7
6
9
4
4
5
input volume:
2
1
4
5
0
7
3
1
4
6
1
1
1
1
1
4
1
5
8
3
8
4
1
5
2
8
3
4
5
5
3
1
4
7
2
2
3
1
8
2
7
2
3
1
4
2

1
0
1
1
2
1
vertical mask:
2
0
2
horizontal mask: 0
0
0
1
0
1
1
2
1

(9 pt) Implement a perceptron that can learn the Boolean function AND using the threshold activation function.
Fun with proofs

(5pt) Prove that a multilayer perceptron with one hidden layer with two neurons and output layer with one neuron is an a ne function of the input if the activation function for each neuron is an a ne function. To make things simple and concrete, you need only demonstrate the result for the mlp show below. Briefly explain the implications of this result for using multilayer perceptrons with a ne activation functions to learn the XOR data.
Figure 2: Multilayer Perceptron
3