Description
This homework contains 2 questions. The last question requires programming. The maximum number of points is 100 plus 20 bonus points.

PCA via Successive Deflation [30 points]
(Adapted from Murphy Exercise 12.7)
Suppose we have a set of n data points x_{1}; : : : ; x_{n}, where each x_{i} is represented as a ddimensional
column vector. Assume that the data has been centerlized, i.e., having zero mean: 
1 
_{i}^{n}_{=1} ^{x}i 
= 0. Let 

n 

X = [x 
1 
; : : : ; x 
n 
] 
be the 
(d 
n) matrix where column i is equal to x 
. Define C = 
1 
XX^{T} 
to be the 

1 
n 
i 
^{P}n 

covariance matrix of X, where c 
ij ^{=} 
^{P}_{l=1} ^{x}il^{x}jl 
= covar(i; j). 

n 
Next, order the eigenvectors of C by their eigenvalues (largest first), and let v_{1}; v_{2}; : : : ; v_{k} be the first k eigenvectors. These satisfy
(
_{v}_{i}T _{v}_{j} _{=} ^{0} ^{if i 6= j}
1 if i = j
v_{1} is the first principal eigenvector of C (the eigenvector with the largest eigenvalue), and as such satisfies
Cv_{1} = _{1}v_{1}. Now define x~_{i} as the orthogonal projection of x_{i} onto the space orthogonal to v_{1}:
x~_{i} = (I v_{1}v_{1}^{T} )x_{i} 

~ 
; : : : ; x~_{n}] as the deflated matrix of rank d 1, which is obtained by removing 

Finally, define X = [x~_{1} 

from the ddimensional data the component that lies in the direction of the first principal eigenvector: 

~ 
T 

X=(I 
v_{1}v_{1} )X 

[7 points] Show that the covariance of the deflated matrix,

^{1}_{~~}T C = _{n}XX


is given by
1
~
T
T
C =
XX
1^{v}1^{v}1
n
(Hint: Some useful facts: (I v_{1}v_{1}^{T} ) is symmetric, XX^{T} v_{1} = n _{1}v_{1}, and v_{1}^{T} v_{1} = 1. Also, for any matrices A and B, (AB)^{T} = B^{T} A^{T} .)
2. [7 points] Show that for j 6= 1, if v_{j} is a principal eigenvector of C with corresponding eigenvalue _{j}
~
(that is, Cv_{j} = _{j }v_{j} ), then v_{j} is also a principal eigenvector of C with the same eigenvalue _{j} .
~
3. [8 points] Let u be the first principal eigenvector of C. Explain why u = v_{2}. (You may assume u is unit norm.)
4. [8 points] Suppose we have a simple method f for finding the leading eigenvector and eigenvalue of a positivedefinite matrix, denoted by [ ; u] = f(C). Write some pseudocode for finding the first k principal basis vectors of X that only uses the special f function and simple vector arithmetic.
(Hint: This should be a simple iterative routine that takes only a few lines to write. The input is C; k; and the function f, the output should be v_{j} and _{j} for j 2 1; ; k)
1
In this question, you will train a convolutional neural network (CNN) to classify images and videos using Pytorch. We use the UCF101 data (see http://crcv.ucf.edu/data/UCF101.php). There are also 10 classes of data in this homework but the data and the number of classes are different from those of Homework 4. Each clip has 3 frames and each frame is 64 64 pixels. The labels of train and validation clips are provided in hw6 data:mat.
You will first train a CNN for action classification for each image. You will then improve the network architecture and submit the classification results on the test data to Kaggle. Then, you will train a CNN using 3D convolution for a set of video frames (rather than for individual frames), and submit your results to Kaggle.
The detail instructions and questions are in the jupyter notebook Action CN N:ipynb. In this file, there are 8 ‘ToDos’ spots for you to fill. The score of each ToDo is specified at the spot. For the 5^{th} and 8^{th} ToDos, you need to submit CSV result files to Kaggle. The results would be evaluated by Categorization Accuracy.For the 5^{th} ToDo, submit to https://www.kaggle.com/c/cse512f18hw6img. For the 8^{th} ToDo, submit to https://www.kaggle.com/c/cse512f18hw6vid.
We will maintain a leader board for each Kaggle competition, and the top three entries at the end of the competition (official assignment due date) will receive 10 bonus points. Any submission that rises to top three after the assignment deadline is not eligible for bonus points. The ranking will be based on the Categorization Accuracy. Marks for these questions will be scaled according to the ranking on the Private Leaderboard. To prevent exploiting test data, you are allowed to make a maximum of 2 submissions per 24 hours. Your submission will be evaluated immediately and the leader board will be updated.
Environment setting
Please make a :=data folder under the same directory with the Action CN N:ipynb file. Put data :=trainClips, :=valClips, :=testClips and hw6 data:mat under :=data.
We recommend using virtual environment for the project. If you choose not to use a virtual environment, it is up to you to make sure that all dependencies for the code are installed globally on your machine. To set up a virtual environment, run the following in the commandline interface:
cd your_hw6_folder 

sudo pip install virtualenv 
# This may 
already 
be installed 

virtualenv .env 
# Create a 
virtual 
environment 

source .env/bin/activate 
# 
Activate 
the virtual environment 

pip install r requirements.txt 
# 
Install dependencies 

Note that this does NOT install TensorFlow or PyTorch,

which you need to do yourself.

Work (hard) on the assignment

… and when you’re done:
deactivate # Exit the virtual environment
Note that every time you want to work on the assignment, you should run ‘source .env/bin/activate’ (from within your hw6 folder) to reactivate the virtual environment, and deactivate again whenever you are done.

What to submit?
3.1 Blackboard submission
You will need to submit both your code and your answers to questions on Blackboard. Put the answer file and your code in a folder named: SBUID FirstName LastName (e.g., 10947XXXX lionel messi). Zip this folder and submit the zip file on Blackboard. Your submission must be a zip file, i.e, SBUID FirstName LastName.zip.
2
The answer file should be named: answers.pdf. The first page of the answers.pdf should be the filled cover page at the end of this homework. The remaining of the answer file should contain:
1. Answers (and derivations) to Question 1
You can use Latex if you wish, but it is not compulsory.
3.2 Kaggle submission
For Question 2, you must submit a CSV file to get the accuracy from the competition sites, mentioned above, A submission file should contain two columns: ID and Class. The file should contain a header and have the following format.
Id; Class
42; 2
43; 5
::: :::
Two sample submission files are available from the competition site and our handout.

Cheating warnings
Don’t cheat. You must do the homework yourself, otherwise you won’t learn. You cannot ask and discuss with students from previous years. You cannot look up the solution online.
3
CSE512 Fall 2018 – Machine Learning – Homework 6
Your Name:
Solar ID:
NetID email address:
Names of people whom you discussed the homework with: