Homework 3 Solution

$30.00

Category: Tag:

Description

Task: Work through each set of exercises. If you get stuck on any of the exercises you can ask Yi or myself for help by email or during o ce hours.

What to submit: Submit your answers for all of the exercises in this document to the appropriate dropbox on the Carmen site. Answers for the concept check and proof sections can be hand-written (e.g., submitted as a scanned image), but please make sure that your writing is readable. Answers to the coding section must be written in python and must be runnable by the grader.

Due date: Submit your answers to the Carmen dropbox by 11:59pm, Jun. 27th.

Concept check

  1. (2pt) Using the alarm network on slide 3 of the Bayesian Inference slides, compute P(B j + j; +m).

  1. (3pt) Refer to the Naive Bayes Classifier shown below. Suppose C has domain fc1; c2; c3g and each Xi is a Boolean variable with values true and f alse. Using the Bayesian net G, compute the following distribution, show-ing the manner in which you derived your answer.

P(C j X1 = f alse; X2 = true; X3 = f alse):

3. (3pt) The sigmoid function

1

s(z) = 1 + e z

1

C :: P (c1)

P (c2)

G

X2 :: P (true|c1)

P (true|c2)

P (true|c3)

C

0.3

0.5

0.9

0.5

0.7

X1 :: P (true|c1)

P (true|c2)

P (true|c3)

X3 :: P (true|c1)

P (true|c2)

P (true|c3)

0.7

0.4

0.2

X2

0.6

0.4

0.2

X1

X3

Figure 1: A Naive Bayes Classifier.

has derivative s0(z) = s(z)(1 s(z)). Moreover, recall that during backpro-pogation the derivative s0(z) is a factor in the gradient computation used to update the weights of a multilayer perceptron (see slides 28-30 in the neural-nets.pdf slide set). Activation functions like sigmoid have a “satura-tion” problem: when z is very large or very small, s(z) is close to 1 or 0, respectively, and so s0(z) is close to 0. As a result, corresponding gradients will be nearly 0, which slows down training. A ne activation functions with positive slope always have a positive derivative and thus will (more or less) not exhibit saturation, but they have other drawbacks (think back to lab 6). Do a little research and find a non-a ne activation function that avoids the saturation problem (hint: ReLU). In your own words, describe how this activation is non-a ne and also avoids the saturation problem. Briefly dis-cuss any drawbacks your chosen activation function may have, as well as similar alternatives that avoid these drawbacks.

Coding

  1. (8pt) Implement in python a convolutional layer (without identity activa-tion) that computes the application of the 3×3 vertical and horizontal sobel masks below to an input image of size 5x5x3 with zero-padding of size 1. That is, the weights of your convolutional layer will not be learned, but rather hard-coded to match the values of the filters. To make things con-crete, use the input volume and masks below:

3

1

3

8

2

5

4

1

3

8

2

2

3

7

3

4

1

5

7

9

4

9

1

4

7

6

9

4

4

5

input volume:

2

1

4

5

0

7

3

1

4

6

1

1

1

1

1

4

1

5

8

3

8

4

1

5

2

8

3

4

5

5

3

1

4

7

2

2

3

1

8

2

7

2

3

1

4

2

1

0

1

1

2

1

vertical mask:

2

0

2

horizontal mask: 0

0

0

1

0

1

1

2

1

  1. (9 pt) Implement a perceptron that can learn the Boolean function AND using the threshold activation function.

Fun with proofs

  1. (5pt) Prove that a multilayer perceptron with one hidden layer with two neurons and output layer with one neuron is an a ne function of the input if the activation function for each neuron is an a ne function. To make things simple and concrete, you need only demonstrate the result for the mlp show below. Briefly explain the implications of this result for using multilayer perceptrons with a ne activation functions to learn the XOR data.

Figure 2: Multilayer Perceptron

3


error: Content is protected !!