# STAT Homework # 2 Solution

\$30.00

Category:

## Description

Instructions:  You may  discuss the homework  problems  in small groups,  but  you must write up the final solutions and code yourself. Please turn  in your code for the problems  that  involve coding.  However, for the  problems  that  involve coding, you must also provide written  answers: you will receive no credit if you submit code with- out written  answers.  You might want to use Rmarkdown to prepare your assignment.

1. Suppose we have a quantitative response Y , and a single feature X ∈ R. Let

RSS1   denote the residual sum of squares that  results from fitting the model

Y  = β0  + β1X +

using least squares.  Let RSS12   denote the residual sum of squares that  results from fitting the model

Y  = β0  + β1X + β2X 2  +

using least squares.

(a)  Prove that  RSS12   ≤ RSS1 .

(b)  Prove that  the R2  of the model containing just the feature X is no greater than  the R2  of the model containing  both X and X 2.

1. Describe the null hypotheses to which the  p-values  in Table  3.4 of the  text- book correspond.   Explain  what  conclusions you can draw  based on these p- values.  Your explanation  should be phrased in terms of sales, TV, radio, and newspaper, rather  than  in terms of the coefficients of the linear model.

1. Consider a linear model with just one feature,

Y  = β0 + β1X +  .

Suppose we have  n  observations  from this  model,  (x1, y1), . . . , (xn , yn ).   The least squares estimator  is given in (3.4) of the textbook.  Furthermore, we saw

in class that  if we construct  a n × 2 matrix  X˜

whose first column is a vector of

1’s and whose second column is a vector with elements x1, . . . , xn , and if we let y denote the vector with elements y1 , . . . , yn , then  the least squares estimator takes the form

βˆ0   =  X˜ T X˜  −1 X˜ T y.                                        (1)

βˆ1

Prove that  (1) agrees with equation  (3.4) of the textbook,  i.e. equal βˆ0  and βˆ1  in (3.4).

βˆ0  and βˆ1  in (1)

1. This question involves the use of multiple  linear regression on the  Auto data set, which is available as part  of the ISLR library.

(a)  Use the lm() function to perform a multiple linear regression with mpg as the  response and  all other  variables  except  name as the  predictors.   Use the summary() function to print the results.  Comment on the output. For instance:

1. Is there a relationship between the predictors  and the response?
2. Which predictors appear to have a statistically significant relationship to the response?

iii. Provide an interpretation for the coefficient associated with the vari- able year.

Make sure that  you treat  the qualitative  variable origin appropriately.

(b)  Try out some models to predict mpg using functions of the variable horsepower.

Comment on the best model you obtain.   Make a plot with horsepower

on the  x-axis and mpg on the  y-axis that  displays both  the  observations

and the fitted function (i.e.

fˆ(horsepower)).

(c)  Now fit a model to predict mpg using horsepower, origin, and an interac- tion between horsepower and origin.  Make sure to treat  the qualitative variable origin appropriately. Comment on your results.  Provide a care- ful interpretation of each regression coefficient.

1. Consider fitting a  model  to  predict  credit  card  balance using  income and student,  where student is a qualitative variable  that  takes  on one of three values: student∈ {graduate, undergraduate, not student}.

(a)  Encode  the  student variable  using two dummy  variables,  one of which equals 1 if student=graduate (and 0 otherwise), and one of which equals

1 if student=undergraduate (and 0 otherwise).  Write  out an expression for a linear model to predict  balance using income and student, using this coding of the dummy variables.  Interpret the coefficients in this linear model.

(b)  Now encode the student variable using two dummy variables, one of which equals  1 if student=not student (and  0 otherwise),  and  one of which

equals 1 if student=graduate (and 0 otherwise).  Write out an expression for a linear model to predict  balance using income and student, using this coding of the dummy variables.  Interpret the coefficients in this linear model.

(c)  Using the coding in (a), write out an expression for a linear model to pre- dict balance using income, student, and an interaction between income and student.  Interpret the coefficients in this model.

(d)  Using the coding in (b), write out an expression for a linear model to pre- dict balance using income, student, and an interaction between income and student.  Interpret the coefficients in this model.

(e)  Using simulated  data  for balance, income, and student, show that  the fitted  values (predictions) from the  models in (a)–(d)  do not  depend  on the coding of the  dummy  variables  (i.e.  the models in (a)  and (b)  yield the same fitted values, as do the models in (c) and (d)).

1. Extra Credit. Consider a linear model with just one feature,

Y  = β0 + β1X +  ,

with  E( ) = 0 and  Var(  ) = σ2 .  Suppose we have n observations  from this model, (x1 , y1), . . . , (xn , yn ).  We assume that  x1 , . . . , xn  are fixed, so the only randomness  in the  model comes from   1, . . . ,  n .   Use (3.4)  in the  textbook

— or, if you prefer,  the  matrix  algebra  formulation  in (1) of this  homework assignment — in order to derive the expressions for Var(βˆ0) and Var(βˆ1 ) given in (3.8) of the textbook.

error: Content is protected !!