Homework Set Two Solution



  1. Problem 2.6.26 in Duda, Hart, and Stork (DHS).

  1. In this problem we will consider the ML estimate of the parameters of a multinomial distribution. Consider a random variable X such that PX (k) = πk, k {1, . . . , N }. Suppose we draw n independent observations from X and form a random vector C = (C1, . . . , CN )T where Ck is the number of times that the observed value is k (i.e. C is the histogram of the sample of observations). Then, C has

multinomial distribution





, . . . , c

) =

πcj .







k=1 ck!


  1. Derive the ML estimator for the parameters πi, i = 1, . . . , N . (Hint: notice that these parameters are probabilities, which makes this an optimization problem with a constraint. If you know about Lagrange multipliers feel free to use them. Otherwise, note that minimizing a function f (a, b) under the constraint a + b = 1 is the same as minimizing the function f (a, 1 a)).

  1. Is the estimator derived in a) unbiased? What is its variance? Is this a good estimator? Why?

  1. Problem 3.2.8 in DHS.

  1. Problem 3.2.10 in DHS. Assume that the random variables X1, . . . , Xn are iid with a distribution of mean μ, which is the quantity to estimate.

  1. In this problem we will consider the ML estimate of the Gaussian covariance matrix.

  1. Problem 3.4.13 in DHS.

  1. Derive the same result by computing derivatives in the usual way. (Hint: you may want to use a man-ual of matrix calculus such as that at http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html. Also, it may be easier to work with the precision matrix P = Σ1.)

  1. (computer) This week we will continue trying to classify our cheetah example. Once again we use the decomposition into 8 × 8 image blocks, compute the DCT of each block, and zig-zag scan. However, we are going to assume that the class-conditional densities are multivariate Gaussians of 64 dimensions.

Note: The training examples we used last time contained the absolute value of the DCT coefficients instead of the coefficients themselves. Please download the file TrainingSamplesDCT 8 new.mat and use it in this and all future exercises. For simplicity, I will still refer to it as TrainingSamplesDCT 8.mat.

  1. Using the training data in TrainingSamplesDCT 8.mat compute the histogram estimate of the prior PY (i), i {cheetah, grass}. Using the results of problem 2 compute the maximum likelihood estimate for the prior probabilities. Compare the result with the estimates that you obtained last week. If they are the same, interpret what you did last week. If they are different, explain the differences.


  1. Using the training data in TrainingSamplesDCT 8.mat, compute the maximum likelihood estimates for the parameters of the class conditional densities PX|Y (x|cheetah) and PX|Y (x|grass) under the Gaussian assumption. Denoting by X = {X1, . . . , X64} the vector of DCT coefficients, create 64 plots with the marginal densities for the two classes – PXk|Y (xk|cheetah) and PXk|Y (xk|grass), k = 1, . . . , 64 – on each. Use different line styles for each marginal. Select, by visual inspection, what you think are the best 8 features for classification purposes and what you think are the worst 8 features (you can use the subplot command to compare several plots at a time). Hand in the plots of the marginal densities for the best-8 and worst-8 features (once again you can use subplot, this should not require more than two sheets of paper). In each subplot indicate the feature that it refers to.

  1. Compute the Bayesian decision rule and classify the locations of the cheetah image using i) the 64-dimensional Gaussians, and ii) the 8-dimensional Gaussians associated with the best 8 features. For the two cases, plot the classification masks and compute the probability of error by comparing with cheetah mask.bmp. Can you explain the results?


error: Content is protected !!