Final Project Solution

$30.00 $24.90

Description

This is the nal project of the course. You need to submit this prior to coming to the in-class written exam. This nal project has 3 parts, one about applications of linear algebra models, one about discrete and network models, and nally one about convex models.

Solutions need to be submitted via CANVAS using the uploading online capa-bilities. Other methods of submission without prior approval will receive zero points. Submit les that contains all code (with comments), and all pdf latex docs. Dont forget to submit the data you used (specially if you reformatted things).

Make sure the les run the code from simply calling them, e.g., all data les should be included. We will not download les for you or retype or copy paste your code. If it does not run straight out of the box, you will receive zero points.

Be organized and use the notation appropriately. Show your work on every problem. Correct answers with no support work will not receive full credit.

  1. Challenge: Automatic Classi cation of images via SVD decomposition

You will write an algorithm in MATLAB for the classi cation of handwritten digits. For this you can download the following les from https://www.math.ucdavis.edu/~deloera/TEACHING/ MATH160/guessdigit-project.zip

NOTE: Data comes in compressed form, to open you type (after downloading), unzip guessdigit-project.zip. Inside the directory that will open up you will nd 5 les:

ima2.m — Code Displays an image vector in the right orientation

azip.mat — the matrix of training digits data

dzip.mat — command dzip(i) tells you the (correct) digit you have in column

i of the matrix azip

dtest.mat — tells you the (correct answer) of test digits

testzip.mat — the test digits data

Use the training set, and compute the SVD of each class matrix (classes are those matrices that represent the same digit). Use the rst few (5, 10, 20) singular vectors as basis of a class and classify unknown test digits according to how well they can be represented in terms of the respective bases (use the relative residual vector in the least squares problem as a measure). Here are some speci c tasks.

Write your code to do classi cation, it brakes the training data in classes, computes the SVD of each class and uses that to make predictions. It takes in a test data point and makes a prediction.

Tune the algorithm for accuracy of classi cation. Give a table or graph of the percentage of correctly classi ed digits as a function of the number of basis vectors. Graph the situation for 5, 10, 20 basis vectors. Display the results in a table (or tables).

Check the singular values of the di erent classes. Is it reasonable to use di erent numbers of basis vectors for di erent classes? If so, perform a few experiments to nd out if it really pays o to use fewer basis vectors in one or two of the classes (i.e., do you get di erent/same outcome?).

Check if all digits are equally easy or di cult to classify. Also look at one of the di cult ones, and see that in many cases they are very badly written. What is the most di cult digit to read for the computer? Does it help to increase the number of singular vectors you used? Write comments at the very end of your code with your thoughts.

  1. Challenge: Models for predicting the quality of wines.

In this problem we will use two types of convex-linear models to to predict wine quality (as judged

by enologists) from chemical measurements. The dataset you need to use is available at http: //www.math.ucdavis.edu/~deloera/TEACHING/MATH160/winesinfo.csv. In each line, the rst 11 columns contain the results from various chemical tests performed on the wine, and the last column is the evaluation of how good the wine is (a score between 0 and 10).

First, consider a model based on Linear programming. For wine sample i, let us denote by yi 2 R its score and by xi 2 R11 its chemical properties. Construct a linear model to predict yi as a function of xi, that is, we want to nd a 2 R11 and b 2 R such that:

yi ‘ a|xi + b:

The quality of the model will be evaluated using the `1 norm, i.e., we want to nd a solution to this optimization problem:

n

min 1 Xjyi a|xi bj:

a2R11 n

b2R i=1

a. Remember from class that the above problem is equivalent to the following linear program:

n

min 1 Xzi

a2R11 n

b2R i=1

z2Rn

s.t. zi yi a|xi b; 1 i n

zi a|xi + b yi; 1 i n

Explain how to rewrite this problem in matrix form:

min c|d

d2R12+n

s.t. Ad b

In particular, give the dimensions and de nitions of c; A and b.

  1. Use an LP solver (again, we recommend using SCIP, MATLAB works too) to solve the above problem and report your code as well as the optimal value of the problem. Note that the value of the problem is exactly the average absolute error of the linear model on the dataset. Does it seem to be within an acceptable range?

In this next part, you will re-use the dataset to t a di erent linear model to predict wine quality as a function of the chemical measurements, but we will use least-squares regression instead of `1-regression.

    1. Compute the optimal solution to the least-squares regression problem. Report your code, the linear model (a and b) and the value of function RSS for this model.

    1. Now sparsify the model to nd the key 4 features of the wine that make it a good wine. What are the top 4 features for deciding quality of wines according to a LASSO model?

  1. Challenge: How to in uence the opinion of people when on a budget (Facebook?).

It is in fashion to in uence voters in elections. In this problem, you will consider a simpli ed model of in uence in social networks: the social network is represented by an undirected graph G = (V; E) and for a set of nodes S V , we denote by N(S) the set of their neighbors:

N(S) = fv 2 V j 9u 2 S; (u; v) 2 Eg:

The in uence I(S) of a set of nodes is measured by I(S) = jN(S)j.

Imagine you work now for facebook and you are given the dataset available at http://www.math. ucdavis.edu/~deloera/TEACHING/MATH160/facebookgraph.txt. This dataset is in fact a sub-graph of the Facebook social graph. Each line in the le contains the id of two users, indicating that these two users are friends on Facebook.

  1. As the designer of a marketing campaign to in uence the opinion of voters, your goal is to nd

a subset S V of at most K nodes whose in uence is maximal. Write a mathematical model to solve this problem. Can you solve the problem for the data you were given using SCIP? If not, can you write a practical method to give an approximation to the optimum? Extra bonus if you can prove a guarantee of approximation.

  1. Using any of your models/methods from part (a) write a computer function which, given the social network described in the dataset and a budget K 2 N+, returns an approximately optimal set of nodes S for the in uence function I(S). The function should return both the users to in uence and the value (total amount of in uence) obtained.

  1. Plot the in uence I(S) obtained by your function as a function of the budget K. E.g., what happens when K = 1 and K = jV j?

  1. In the de nition of I(S) a node with largest degree is one of the most in uential persons, i.e., he/she is a VIP. Can you think of a way to adapt the pagerank algorithm to identify VIP nodes in the graph? Can you invent of a third additional way to de ne who are the VIPs in this graph? What else could you use to measure in uence?