Description

(30 points) Consider the following Multilayer Perceptron (MLP) for binary classi cation,
!
$0 ^{$}1 $2
“_{0}=1 “1 “2
%_{1} %2
#_{0}=1 #_{1} #_{2}
we have the following error function:

E(w_{1}; w_{2}; vjX) =
^{X}t
r^{t} log y^{t} + (1 r^{t}) log(1 y^{t});
where_{t}
y^{t} = sigmoid(_{t}
^{v}2 ^{z}2 ^{+v}_{t}
_{1} z_{1} +v_{0}), z_{1}^{t} = sigmoid(w_{1;2}x_{2}^{t} +w_{1;1}x_{1}^{t} +w_{1;0}) and
z_{2} = LReLU(w_{2;2}x_{2}
^{+} ^{w}2;1^{x}_{1}
+ w_{2;0}), the leaky recti ed linear unit LReLU(x)
de ned as

Instructor: Rui Kuang (kuang@cs.umn.edu). TAs: Jungseok Hong (jungseok@umn.edu) and Ujval Bangalore Umesh (banga038@umn.edu).
1
(
LReLU(x) =
0:01x; for x < 0
x; otherwise

Derive the equations for updating fw_{1}; w_{2}; vg of the above MLP.

Now, consider shared weights w = w_{1} = w_{2}. Derive the equations for updating fw; vg .
Hint: Read Section 11.7.2 to see how Equations 11.23 and 11.24 are derived from Equation 11.22

0:01;
for x < 0
Hint 2: LReLU^{0}
(x) = ^{(}_{1;}
otherwise
.

(40 points) Implement a Multilayer Perceptron (MLP) with stochastic gradient descent to classify the opticaldigit data. Train your MLPs on the \optdigits train.txt” data, tune the number of hidden units using the \optdigits valid.txt” data, and test the prediction performance using the \optdigits test.txt” data. (Read the submission instruction carefully to prepare your submission les.)


Implement a MLP with 1 hidden layer using the ReLU activation function:

ReLU(x) =
(
0; for x < 0
x; otherwise
Use the MLP for classifying the 10 digits. Read the algorithm in Figure 11.11 and section 11.7.3 in the textbook. When using the ReLU activation function, the online version of Equation 11.29 becomes:

0
for w^{T} x < 0
h_{P}
h
i
w_{hj} = ^{(}
_{i}(r_{i} y_{i})v_{ih} x_{j} otherwise
Try MLPs with f3; 6; 9; 12; 15; 18g hidden units. Report and plot the training and validation error rates by the number of hidden units. How many hidden units should you use? Report the error rate on the test set using this number of hidden units.
Hint: When choosing the best stepsize (between 0 and 1 such as 10 ^{5}
2
), you might need to start with some value and, after a certain number of iterations, decrease your to improve the convergence. Alternatively, you can implement Momentum or Adaptive Learning Rate (section 11.8.1 in the textbook).


Train your MLP with the best number of hidden units obtained. Combine the training set and the validation set as one (training+validation) dataset to run the trained MLP from problem 2(a) with the data. Apply PCA to the values obtained from the hidden units (you can use the Matlab pca() function). Using the projection to the rst 2 principal components, make a plot of the training+validation dataset (similar to Figure 11.18 in the textbook). Use di erent colors for di erent digits and label each sample with its corresponding digits (the same as you did in HW3). Repeat the same projecting the datasets to the rst 3 principal components and do the visualization using 3D plot. (Hint: you can use the MATLAB function plot3() to visualize the 3D data). Compare the 2D and 3D plots and explain the results in the report.

Note: Change the xaxis and yaxis to log scale in order to better visualize the datapoints.

(30 points) MATLAB provides the Deep Learning Toolbox for designing and implementing deep neural networks. In this homework question you will learn how to create simple convolutional neural networks (CNNs) for optdigits classi caion.


Read the MATLAB documentation^{2} to get familiar with how to




Load and explore image data.





De ne the network architecture.





Specify training/validation options.





Train the network.





Predict the labels of testing data and calculate the classi cation accuracy.


Read another MATLAB documentation^{3} to learn how to de ne your own customized layer.
^{2}https://www.mathworks.com/help/deeplearning/examples/createsimpledeeplearningnetworkforcl html
^{3}https://www.mathworks.com/help/releases/R2018a/nnet/ug/ definecustomdeeplearninglayer.html
3

Run the dataPreparation.m script to convert the three optdigits.txt les into the required input formats. Modify the examplePreluLayer.m (as described in the Completed Layer section of 3) le to de ne a class called myLReLULayer, which creates the leaky ReLU layer as de ned in Question 1.

Modify the De ne Network Architecture section in the main.m le to test the following two CNN structures.


Input layer ! 2D convolution layer (1 lter with size 4) ! Batch normalization layer ! LReLU layer (use your own customized myLReLULayer class) ! Fully connected layer ! Softmax layer ! Classi cation layer



Input layer ! 2D convolution layer (20 lter with size 3) ! Batch normalization layer ! LReLU layer (use your own customized myLReLULayer class) ! Pooling layer (use max pooling with poolsize 3 and stride size 2) ! 2D convolution layer (32 lter with size 3) ! Batch normalization layer ! LReLU layer (use your own customized myLReLULayer class) ! Fully connected layer ! Softmax layer ! Classi cation layer

For both network structures, take a screen shot of the Training Process images generated by MATLAB, and report the accuracies on the testing data.
Instructions
Solutions to all questions must be presented in a report which includes result explanations, and all images and plots.
All programming questions must be written in Matlab, no other programming languages will be accepted. The code must be able to be executed from the Matlab command window on the cselabs machines. Each function must take the inputs in the order speci ed and print/display the required output to the Matlab command window. For each part, you can submit additional les/functions (as needed) which will be used by the main functions speci ed below. Put comments in your code so that one can follow the key parts and steps. Please follow the rules strictly. If we cannot run your code, you will receive no credit.
Question 2:
4
{ Train a MLP: mlptrain(train data:txt: path to training data le, val data:txt: path to validation data, m: number of hidden units, k: number of output units). The function must return in variables the outputs (z: a n m matrix of hidden unit values, w: a m (d+1) matrix of input unit weights, and v: a k (m + 1) matrix of hidden unit weights). The function must also print the training and validation error rates for the given function parameters.
{ Test a MLP: mlptest(test data:txt: path to test data le, w: a m (d+ 1) matrix of input unit weights, v: a k (m + 1) matrix of hidden unit weights). The function must return in variables the outputs (z: a n m matrix of hidden unit values), where n is the number of training samples. The function must also print the test set error rate for the given function parameters.
{ mlptrain will implement an MLP with d inputs and one input bias unit, m hidden units and one hidden bias unit, and k outputs.
{ problem2a.m and problem2b.m: scripts to solve the problems 2 (a) and (b), respectively, calling the appropriate functions.
You may nd the following builtin Matlab functions: repmat() and reshape(). For the optdigits data, the rst 64 columns are the data and the last column
is the label.
Submission
Things to submit:


hw4 sol.pdf: A PDF document which contains the report with solutions to all questions.



mlptrain.m: The Matlab code of the mlptrain function.



mlptest.m: The Matlab code of the mlptest function.



problem2a.m: Code to solve problem 2 (a).



problem2b.m: Code to solve problem 2 (b).



myLReLULayer.m: Your own customized leaky ReLU layer in problem 3(b).



main.m: The modi ed script for the neural structure in problem 3(c)(ii).

5


Any other les, except the data, which are necessary for your code.

Submit: hw4 sol.pdf and a zip le of all other les must be submitted electronically via Canvas.
6