Description
Problem 1: Linear regression learns a linear function of feature variables X to t the responses y. In this problem, you will derive the closedform solution for linear regression formulations.
1. The standard linear regression can be formulated as solving a least square problem
minimize jjXw yjj^{2};
w
where X 2 R^{n m} (n m) represents the feature matrix, y 2 R^{n} ^{1} represents the response vector and w 2 R^{m} ^{1} is the vector variable of the linear coe cients. This is a convex objective function of w. Derive the optimal w by setting the derivative of the function wrt w to zero to minimize the objective function.

In practice, a L2norm regularizer is often introduced with the least squares, called Ridge Regression, to overcome illposed problems where the hessian matrix is not positive de nite. The objective function of ridge regression is de ned as

w
jj
jj
+
jj
jj
minimize
Xw y
2
w
2_{;}
where > 0. This objective function is strictly convex. Derive the solution of the ridge regression problem to nd the optimal w.
Problem 2: Consider a coin with probability of heads equal to Pr(H) = p and probability of tails Pr(T) = 1 p. You toss it 5 times and get outcomes H,H,T,T,H.

What is the probability of observing the sequence H,H,T,T,H in ve tosses. Also give the formula for the natural logarithm of this probability. Your formulas should be a function of p.

You have a box containing exactly 2 coins, one fair with p = 1=2 and one biased with p = 2=3. You choose one of these two coins at random with equal probability, toss it 5 times and get the outcome H,H,T,T,H.


Give the joint probability that the coin chosen was the fair coin (p = 1=2) and the outcome was H,H,T,T,H.



Give the joint probability that the coin chosen was the biased coin (p = 2=3) and the outcome was H,H,T,T,H.


What should the bias p = Pr(H) be to maximize the probability of observing H,H,T,T,H, and what is the corresponding probability of observing H,H,T,T,H (i.e., what is the maximum likelihood estimate for p), assuming p were unknown? Show the derivation. Hint: maximize the log of the function.
1
Problem 3: Below is the pseudocode of perceptron algorithm for binary classi cation,

w = w_{0}.
2 Do Iterate until convergence

For each sample (x^{t}; r^{t})

If(< w; x^{t} > r^{t} 0)


w = w + r^{t}x^{t}


Implement the perceptron algorithm and test it on the provided data. To begin, load the le data1.mat into MATLAB. X 2 R^{40 2} is the feature matrix of 40 samples in 2
dimensions and y 2 R^{40 1} is the label vector (+1/ 1). Initialize w to be vector [1; 1]. Visualize all the samples (use di erent colors for di erent classes) and plot the decision boundary de ned by the initial w.
Now, run your perceptron algorithm on the given data. How many iterations does it take to converge? Plot the decision boundary de ned by the w returned by the perceptron program.
Hint: To visualize the samples you could use the MATLAB function scatter(). Plotting the boundary is equavilent to plot the line w^{T} x = 0. Therefore, you could rst generate
a vector a to be your xaxis, then compute the yaxis vector b as b = . Once the
plot is generated you could use axis() function to make sure your axes are in a right range. When you are done your plots will look like below gures:

In previous question the samples from the two classes are linearly separable. Now let’s look at a linearly nonseparable case. Load the le data2.mat into MATLAB and run your perceptron algorithm with w = [1; 1]. Can the perceptron algorithm converge? Explain why. To improve the algorithm, we can introduce a “soft” linear classi er to
2
tolerate errors. It turns out we can solve the following LP problem:

w; ^{t}
X_{t}
minimize
^{t}
subject to
r^{t}(w^{T} x^{t}) 1 ^{t}
^{t} 0:
Here ^{t} is the error which needs to be minimized. The MATLAB buildin function linprog() can be used for the problem. Now, run the following MATLAB code to solve the LP problem on data2.mat.
[m,n]=size(X);
f=[zeros(n,1);ones(m,1)]; % transform problem into a standard LP
A1=[X.*repmat(y,1,n),eye(m,m)];
A2=[zeros(m,n),eye(m,m)];
A=[A1;A2];
b=[ones(m,1);zeros(m,1)];
x = linprog(f,A,b);% solve LP
w=x(1:n);% return varible w
Apply this algorithm to data2.mat, visualize the data and plot the boundary by the w returned by LP.
Submission
Things to submit:


hw0 sol.pdf: a document contains all the derivations of Problem 1&2 and the three plots asked by Problem 3.



MyPerceptron.m: a MATLAB function de ned as [w, step]=MyPerceptron(X, y, w0), where X is the feature matrix, y is a label vector and w0 is the initialization of the parameter w. In the output, w is the parameter found by perceptron and step represents the number of steps the algorithm takes to converge. The function should also display the plot of samples and boundary.



Zip all the les into a single zipped le and name it as your name.

Submit: All material must be submitted electronically via Canvas. This homework will not be graded but required as a proof of satisfying the prerequisites for taking the class.
3