Description
This homework is intended to help you (1) learn (or refresh your understanding of) how to implement linear algebraic operations in Python using numpy (to which we refer in the code below as np); (2) practice implementing one of the simpler machine learning algorithms: stepwise classi cation.

Part 1: Python and numpy
For each of the problems below, write a method (e.g., problem1) that returns the answer for the corresponding problem. Put all your methods in one le called homework1 WPIUSERNAME.py
(e.g., homework1 jrwhitehill.py, or homework1 jrwhitehill lleshin.py for a team). See the starter le homework1 template.py. In all problems, you may assume that the the dimensions of the matrices and/or vectors are compatible for the requested mathematical operations.
Note 1: In mathematical notation we usually start indices with j = 1. However, in numpy (and many other programming settings), it is more natural to use 0based array indexing. When answering the questions below, do not worry about \translating” from 1based to 0based indexes. For example, if the (i; j)th element of some matrix is requested, (where i; j ), you can simply write A[i,j].
Note 2: To represent and manipulate arrays, please use numpy’s array class (not the matrix class).

Given matrices A and B, compute and return an expression for A + B. [ 2 pts ]
Answer (freebie!): While it is completely valid to use np.add(A, B), this is unnecessarily verbose; you really should make use of the \syntactic sugar” provided by Python’s/numpy’s operator overloading and just write: A + B. Similarly, you should use the more compact (and arguably more elegant) notation for the rest of the questions as well.
2. Given matrices A, B, and C, compute and return AB C (i.e., rightmultiply matrix A by matrix B, and then subtract C). Use dot or np.dot. [ 2 pts ]

Given matrices A, B, and C, return A B + C^{>}, where represents the elementwise (Hadamard) product and > represents matrix transpose. In numpy, the elementwise product is obtained simply with *. [ 2 pts ]

Given column vectors x and y, compute the inner product of x and y (i.e., x^{>}y). [ 2 pts ]

Given matrix A, return a matrix with the same dimensions as A but that contains all zeros. Use np.zeros. [ 2 pts ]

Given matrix A, return a vector with the same number of rows as A but that contains all ones. Use np.ones. [ 2 pts ]

Given square matrix A and (scalar) , compute A + I, where I is the identity matrix with the same dimensions as A. Use np.eye. [ 2 pts ]

Given matrix A and integers i; j, return the jth column of the ith row of A, i.e., A_{ij}. [ 2 pts ]
Given matrix A and integer i, return the sum of all the entries in the ith row, i.e., 
A 
. Do not 

9. 
use a loop, which in Python is very slow. Instead use the np.sum function. [ 4 pts ]^{P}^{j} 
ij 

10. 
Given matrix A and scalars c; d, compute the arithmetic mean over all entries of A that are between 

c and d (inclusive). In other words, if S = f(i; j) : c A_{ij} dg, then compute 
1 
A_{ij}. Use 

jSj ^{P} 
(i;j) 
2S 

np.nonzero along with np.mean. [ 5 pts ] 

Given an (n n) matrix A and integer k, return an (n k) matrix containing the righteigenvectors of A corresponding to the k largest eigenvalues of A. Use np.linalg.eig to compute eigenvectors. [ 5 pts ]
1

Given square matrix A and column vector x, use np.linalg.solve to compute A ^{1}x. Do not use np.linalg.inv or ** 1 to compute the inverse explicitly; this is numerically unstable and can, in some situations, give incorrect results. [ 5 pts ]

Given square matrix A and row vector x, use np.linalg.solve to compute xA ^{1}. Hint: AB = (B^{>}A^{>})^{>}. Do not use np.linalg.inv or ** 1 to compute the inverse explicitly. [ 5 pts ]

Part 2: Stepwise Classi cation
For the tasks below, write your code in a le called homework1 smile WPIUSERNAME.py, and put your experimental results in homework1 smile WPIUSERNAME.pdf.
In this part of the assignment you will train a very simple smile classi er that analyzes a grayscale image x 2 R^{24 24} and outputs a prediction y^ 2 f0; 1g indicating whether the image is smiling (1) or not (0). The classi er will make its decision based on very simple features of the input image consisting of binary comparisons between pixel values. Each feature is computed as
^{I[x}r_{1};c_{1} ^{>} ^{x}r_{2};c_{2} ^{]}
where r_{i}; c_{i} 2 f0; 1; 2; : : : ; 23g are the row and column indices, respectively, and I[ ] is an indicator function whose value is 1 if the condition is true and 0 otherwise. In general, these features are not very good, but nonetheless they will enable the classi er to achieve an accuracy (f_{PC}) much better than just guessing or just choosing the dominant class. Based on these features, you should train an ensemble smile classi er using stepwise classi cation for m = 5 features. The output of the ensemble (1 if it thinks the image is smiling, 0 otherwise) is determined by the average prediction across all m members of the ensemble. If more than half of the m ensemble predictors g^{(1)}; : : : ; g ^{(m)} think that the image is smiling, then the ensemble says it’s a smile; otherwise, the ensemble says it’s not smiling. (See slide 18 of Lecture2.pdf).
Stepwise classi cation/regression is a greedy algorithm: at each round j, choose the jth feature (r_{1}; c_{1}; r_{2}; c_{2}) such that { when it is added to the set of j 1 features that have already been selected { the accuracy (f_{PC}) of the overall classi er on the training set is maximized. More speci cally, at every round j, consider every possible tuple of pixel locations (r_{1}; c_{1}; r_{2}; c_{2}): if you construct an ensemble with j predictors (the j 1 you’ve already chosen, plus the current \candidate” (r_{1}; c_{1}; r_{2}; c_{2})), is the resulting ensemble more accurate (in terms of PC on training data) than any other tuple of pixel locations during this round? If the current tuple is the best yet (for round j), then save it as your \best seen yet” for round j. If not, ignore it. Move on to the next possible tuple of pixel locations, and repeat until you’ve searched all of them. At the end of round j, you will have selected the best feature for that round, and you add it to your set of selected features. Once added, it stays in the set forever { it can never be removed. (Otherwise, it wouldn’t be a greedy algorithm at all.) Now you move on to round j + 1 until you’ve completed m = 5 rounds.
To measure the ensemble’s accuracy (f_{PC}), you should run it on all the images in the training set, and then compare the output of the ensemble to the corresponding groundtruth labels. At the end of the entire training procedure, you should estimate how well your \machine” (ensemble smile classi er) works on a set of images not used for training, i.e., the test set.
Skeleton code: While how you write your code is up to you (subject to the vectorization constraint and also basic readability); however, to get you started, we sketched in a few functions:
fPC (y, yhat): this takes in a vector of groundtruth labels and corresponding vector of guesses, and then computes the accuracy (PC). The implementation (in vectorized form) should only take 1line.
measureAccuracyOfPredictors (predictors, X, y): this takes in a set of predictors, a set of images to run it on, as well as the groundtruth labels of that set. For each image in the image set, it runs the ensemble (see slide 18 of Lecture2.pdf) to obtain a prediction. Then, it computes and returns the accuracy (PC) of the predictions w.r.t. the groundtruth labels.
2
stepwiseRegression (trainingFaces, trainingLabels, testingFaces, testingLabels): I’ve included some visualization code, but otherwise it’s empty. You need to implement the stepwise classication described above.
Tasks:

Download from Canvas the starter Python le homework1 smile.py as well as the following data les: trainingFaces.npy,trainingLabels.npy,testingFaces.npy,testingLabels.npy.

Write code to train a stepwise classi er for m = 5 features of the binary comparison type described above; the greedy procedure should maximize f_{PC} (which you will also need to implement). At each round, the code should examine every possible feature (r_{1}; c_{1}; r_{2}; c_{2}).. Make sure your code is vectorized to improve runtime performance (wallclock time) [ 25 pts ].

Write code to analyze how training/testing accuracy changes as a function of number of examples n 2 f400; 800; 1200; 1600; 2000g (implement this in a forloop)

Run your stepwise classi er training code only on the rst n examples of the training set.



Measure and record the training accuracy of the trained classi er on the n examples.



Measure and record the testing accuracy of the classi er on the (entire) test set.

Important: you must write code (a simple loop) to do this { do not just do it manually for each value of n. This is good experimental practice in general and is especially important in machine learning to ensure reproducibility of results. Using the entire training set, you should achieve a test accuracy of at least 75%. [ 8 pts ].

In a PDF document (you can use whatever tool you like { LaTeX, Word, Google Docs, etc. { but make sure you export to PDF), show the training accuracy and testing accuracy as a function of n. Please use the following simple format:
n trainingAccuracy testingAccuracy 400 … …
800 … …
1200 … …
1600 … …
2000 … …
Moreover, characterize in words how the training accuracy and testing accuracy changes as a function of n, and how the two curves relate to each other. What trends do you observe? [4 pts]

Visualizing the learned features: It is very important in empirical machine learning work to visualize what was actually learned during training. This can be useful for debugging to make sure that your training code is working as it should, and also to make sure your training and testing sets are selected wisely. For n = 2000, visualize the m = 5 features that were learned by (a) displaying any face image from the test set; and (b) drawing a square around the speci c pixel locations ((r_{1}; c_{1}) and (r_{2}; c_{2})) that are examined by the feature. You can use the example code in the homework1 smile.py template to render the image. Insert the graphic (just one showing all 5 features) into your PDF le. [ 3 pts].
Here’s an example that shows just one feature:
3
Tip on vectorization: Implement your training algorithm so that, for any particular feature (r_{1}; c_{1}; r_{2}; c_{2}), all the feature values (over all the n training images) are extracted at once using numpy { do not use a loop. Even after vectorizing my own code, running the experiments required in this assignment took about 1 hour (on a single CPU of a MacBook 2016 laptop).
4