Description

(10 points) Exercise 3.6 (page 92) in LFD.

(20 points) Recall the objective function for linear regression can be expressed as
E(w) = _{N}^{1} kXw yk^{2};
as in Equation (3.3) of LFD. Minimizing this function with respect to w leads to the optimal

as (X^{T} X) ^{1}X^{T} y. This solution holds only when X^{T} X is nonsingular. To overcome this problem, the following objective function is commonly minimized instead:
E_{2}(w) = kXw yk^{2} + kwk^{2};
where > 0 is a userspeci ed parameter. Please do the following:


(10 points) Derive the optimal w that minimize E_{2}(w).



(10 points) Explain how this new objective function can overcome the singularity problem of X^{T} X.


(35 points) In logistic regression, the objective function can be written as
N
_{E(w) =} _{N}^{1} ^{X} _{ln} _{1 + e }ynw^{T} xn _{:}
n=1
Please

(10 points) Compute the rstorder derivative rE(w). You will need to provide the intermediate steps of derivation.

(10 points) Once the optimal w is obtain, it will be used to make predictions as follows:

(
1
if (w^{T} x) < 0:5
Predicted class of x =
1
if (w^{T} x) 0:5
where the function (z) =
1
looks like
1+e
z
1
Explain why the decision boundary of logistic regression is still linear, though the linear signal w^{T} x is passed through a nonlinear function to compute the outcome of prediction.

(5 points) Is the decision boundary still linear if the prediction rule is changed to the following? Justify brie y.
(
1 if (w^{T} x) 0:9
Predicted class of x =
1 if (w^{T} x) < 0:9


(10 points) In light of your answers to the above two questions, what is the essential property of logistic regression that results in the linear decision boundary?


(35 points) Logistic Regression for Handwritten Digits Recognition: Implement logistic regression for classi cation using gradient descent to nd the best separator. The handwritten digits les are in the \data” folder: train.txt and test.txt. The starting code is in the \code” folder. In the data le, each row is a data example. The rst entry is the digit label (\1″ or \5″), and the next 256 are grayscale values between 1 and 1. The 256 pixels correspond to a 16 16 image. You are expected to implement your solution based on the given codes. The only le you need to modify is the \solution.py” le. You can test your solution by running \main.py” le. Note that code is provided to compute a twodimensional feature (symmetry and average intensity) from each digit image; that is, each digit image is represented by a twodimensional vector before being augmented with a \1″ to form a threedimensional vector as discussed in class. These features along with the corresponding labels should serve as inputs to your logistic regresion algorithm.


(15 points) Complete the logistic regression() function for classifying digits number \1″ and \5″.



(5 points) Complete the accuracy() function for measuring the classi cation accuracy on your training and test data.



(5 points) Complete the thirdorder() function to transfer the features into 3rd order polynomial Zspace.



(10 points) Run \main.py” to see the classify results. As your nal deliverable to a customer, would you use the linear model with or without the 3rd order polynomial transform? Brie y explain your reasoning.

Deliverable: You should submit (1) a hardcopy report (along with your writeup for other questions) that summarizes your results and (2) the \solution.py” le to the Blackboard.
Note: Please read the \Readme.txt” le carefully before you start this assignment. Please do NOT change anything in the \main.py” and \helper.py” les when you program.
2