Description
Problems:

(50 points) Check out the files logistic reg.m and find test error.m from the SVN repository set up for this assignment. The files are just function headers that need to be filled in. find test error should encode a function that, given as inputs a weight vector w, a data matrix X and a vector of true labels y (in the formats defined in the header), returns the classification error of w on the data (assuming that the classifier applies a threshold at 0 to the dot product of w and a feature vector x (augmented with a 1 in the first position in the vector to allow for a constant or bias term). logistic reg should encode a gradient descent algorithm for learning a logistic regression model. It should return a weight vector w and the training set error E_{in} (not the classification error, the negative log likelihood function) as defined in class. Use a learning rate = 10 ^{5} and automatically terminate the algorithm if the magnitude of each term in the gradient is below 10 ^{3} at any step.
Implement the functions in the two files. Remember to check in the final version of your code for these two files.
Read more about the “Cleveland” dataset we’ll be using here: https://archive. ics.uci.edu/ml/datasets/Heart+Disease
Learn a logistic regression model on the data in cleveland.train (be careful about the fact that the classes are 0=1 – you should convert them to 1= + 1 so that everything we’ve done in class is still valid). Apply the model to classify the data (using a probability of 0:5 as the threshold) in cleveland.test. In your writeup, report E_{in} as well as the classification error on both the training and test data when using three different bounds on the maximum number of iterations: ten thousand, one hundred thousand, and one million. What can you say about the generalization properties of the model?
Now train and test a logistic regression model using the inbuilt matlab function glmfit (learn about and use the “binomial” option, and check the label format). Compare the results with the best ones you achieved and also compare the time taken to achieve the results.
Now scale the features by subtracting the mean and dividing by the standard deviation for each of the features in advance of calling the learning algorithm (you may find the matlab function zscore useful). Experiment with the learning rate (you may want to start by trying different orders of magnitude), this time using a tolerance (how close to zero you need each element of the gradient to be in order to terminate) of 10 ^{6}. Report the results in terms of number of iterations until the algorithm terminates, and also the final E_{in}.

(15 points) LFD Problem 3.4

(10 points) LFD Problem 3.19

(10 points) LFD Problem 4.8

(15 points) LFD Problem 4.25, parts (a) through (c) only
2