Description
Introduction
In this assignment you will explore gradient descent and perform linear regression on a dataset using crossvalidation to analyze your results.
As with all homeworks, you cannot use any functions that are against the \spirit” of the assignment. For this assignment that would mean an linear regression functions. You may use statistical and linear algebra functions to do things like:
mean std cov
inverse
matrix multiplication transpose
etc…
And as always your code should work on any dataset that has the same general form as the provided one.
Grading
Although all assignments will be weighed equally in computing your homework grade, below is the grading rubric we will use for this assignment:

Part 1
(Theory)
15pts
Part 2
(Gradient Descent)
20pts
Part 3
(Closedform LR)
40pts
Part 4
(SFolds LR)
15pts
Report
10pts
TOTAL
100
Table 1: Grading Rubric
1
Fish Length Dataset (x06Simple.csv) This dataset consists of 44 rows of data each of the form:

Index

Age (days)

Temperature of Water (degrees Celsius)

Length of Fish
The rst row of the data contains header information.
Data obtained from: http://people.sc.fsu.edu/ jburkardt/datasets/regression/regression.html
2


(10pts) Consider the following data:


3
2 1

5 47

7

_{3 1} 7

7
6_{0} _{3}7

7

8 117

7

_{2 5} 7

7
61 07

7
65 17

7
41 35
6 1


Compute the coe cients for the linear regression using least squares estimate (LSE), where the second value (column) is the dependent variable (the value to be predicted) and the rst column is the sole feature. Show your work and remember to add a bias feature and to standardize the features. Compute this model using all of the data (don’t worry about separating into training and testing sets).


For the function g(x) = (x 1)^{4}, where x is a single value (not a vector or matrix):


What is the gradient with respect to x? Show your work to support your answer.



What is the global minima for g(x)? Show your work to support your answer.



Plot x vs g(x) use a software package of your choosing.

3
In this section we want to visualize the gradient descent process on the function g(x) = (x 1)^{4}. You should have already derived (pun?) the gradient of this function in the theory section. To bootstrap the process, initialize x = 0 and terminate the process when the change in the x from one iteration to another is less that 2 ^{23}.
2.1 Fixed Learning Rate
First experiment with your choice of the learning parameter, . From your theory work you should know what the actual minima is. A common starting guess is = 1:0.
In your report you will need

Plot iteration vs g(x).

Plot iteration vs x.

Chosen value of .
2.2 Adaptive Learning Rate
Next let’s try to \intelligently” adapt the learning rate. Start with = 1:0 and reduce by 1=2 whenever the sign of the gradient changes. This indicates that we may have overjumped the minima.
In your report you will need

Plot iteration vs g(x).

Plot iteration vs x.
4
Download the dataset x06Simple.csv from Blackboard. This dataset has header information in its rst row and then all subsequent rows are in the format:
ROW Id; X_{i;1}; X_{i;2}; Y_{i}
Your code should work on any CSV data set that has the rst column be header information, the rst column be some integer index, then D columns of realvalued features, and then ending with a target value.
Write a script that:

Reads in the data, ignoring the rst row (header) and rst column (index).

Randomizes the data

Selects the rst 2/3 (round up) of the data for training and the remaining for testing

Standardizes the data (except for the last column of course) using the training data

Computes the closedform solution of linear regression

Applies the solution to the testing samples

1
N
^ 2
^
Computes the root mean squared error (RMSE):
P
Y
Y
Y
7.
^{q}N
i=1^{(}
i
_{i}) . where
_{i} is the predicted
value for observation X_{i}.
Implementation Details

Seed the random number generate with zero prior to randomizing the data

Don’t forget to add in the bias feature!
In your report you will need:

The nal model in the form y = _{0} + _{1}x_{1} + :::

The root mean squared error.
RMSE: Around 800
Table 2: Closed Form Regression Evaluation
5
CrossValidation is a technique used to get reliable evaluation results when we don’t have that much data (and it is therefore di cult to train and/or test a model reliably).
In this section you will do SFolds CrossValidation for a few di erent values of S. For each run you will divide your data up into S parts (folds) and test S di erent models using SFolds CrossValidation and evaluate via root mean squared error. In addition, to observe the a ect of system variance, we will repeat these experiments several times (shu ing the data each time prior to creating the folds). We will again be doing our experiment on the provided sh dataset.
Write a script that:

Reads in the data, ignoring the rst row (header) and rst column (index).

20 times does the following:


Randomizes the data



Creates S folds.



For i = 1 to S

i. Select fold i as your testing data and the remaining (S 1) folds as your training data



Standardizes the data (except for the last column of course) based on the training data





Train a closedform linear regression model





Compute the squared error for each sample in the current testing fold




You should now have N squared errors. Compute the RMSE for these.


You should now have 20 RMSE values. Compute the mean and standard deviation of these. The former should give us a better \overall” mean, whereas the latter should give us feel for the variance of the models that were created.
Implementation Details

Don’t forget to add in the bias feature!

Set your seed value at the very beginning of your script (if you set it within the 20 tests, each test will have the same randomly shu ed data!).
In your report you will need:

The average and standard deviation of the root mean squared error for S = 3 over the 20 di erent seed values..

The average and standard deviation of the root mean squared error for S = 5 over the 20 di erent seed values.

The average and standard deviation of the root mean squared error for S = 20 over 20 di erent seed values.
6

The average and standard deviation of the root mean squared error for S = N (where N is the number of samples) over 20 di erent seed values. This is basically leaveoneout crossvalidation.

S
Average RMSE
Std of RMSE
3
650
45
5
650
35
20
620
10
N
620
0
Table 3: Evaluation Using SFold Cross Validation
7
For your submission, upload to Blackboard a single zip le with no spaces in the le or directory names and contains:

PDF Writeup

Source Code

readme.txt le
The readme.txt le should contain information on how to run your code to reproduce results for each part of the assignment.
The PDF document should contain the following:

Part 1:


Your solutions to the theory question


Part 2:


Your two gures using value of your choosing, as well as that value.



Your two gures with an adaptive .


Part 3:


Final Model



RMSE


Part 4:


Average and standard deviations of RMSEs for di erent crossvalidations.

8