Description

Problem 2.6.26 in Duda, Hart, and Stork (DHS).

In this problem we will consider the ML estimate of the parameters of a multinomial distribution. Consider a random variable X such that P_{X} (k) = π_{k}, k ∈ {1, . . . , N }. Suppose we draw n independent observations from X and form a random vector C = (C_{1}, . . . , C_{N} )^{T} where C_{k} is the number of times that the observed value is k (i.e. C is the histogram of the sample of observations). Then, C has
multinomial distribution

n!
N
P
(c
, . . . , c
) =
_{π}^{c}j _{.}
,…,C_{N}
N
N
C_{1}
1
j
_{k}_{=1} ck^{!}
j=1

Derive the ML estimator for the parameters π_{i}, i = 1, . . . , N . (Hint: notice that these parameters are probabilities, which makes this an optimization problem with a constraint. If you know about Lagrange multipliers feel free to use them. Otherwise, note that minimizing a function f (a, b) under the constraint a + b = 1 is the same as minimizing the function f (a, 1 − a)).

Is the estimator derived in a) unbiased? What is its variance? Is this a good estimator? Why?

Problem 3.2.8 in DHS.

Problem 3.2.10 in DHS. Assume that the random variables X_{1}, . . . , X_{n} are iid with a distribution of mean μ, which is the quantity to estimate.

In this problem we will consider the ML estimate of the Gaussian covariance matrix.

Problem 3.4.13 in DHS.

Derive the same result by computing derivatives in the usual way. (Hint: you may want to use a manual of matrix calculus such as that at http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html. Also, it may be easier to work with the precision matrix P = Σ^{−}^{1}.)

(computer) This week we will continue trying to classify our cheetah example. Once again we use the decomposition into 8 × 8 image blocks, compute the DCT of each block, and zigzag scan. However, we are going to assume that the classconditional densities are multivariate Gaussians of 64 dimensions.
Note: The training examples we used last time contained the absolute value of the DCT coeﬃcients instead of the coeﬃcients themselves. Please download the file TrainingSamplesDCT 8 new.mat and use it in this and all future exercises. For simplicity, I will still refer to it as TrainingSamplesDCT 8.mat.

Using the training data in TrainingSamplesDCT 8.mat compute the histogram estimate of the prior P_{Y} (i), i ∈ {cheetah, grass}. Using the results of problem 2 compute the maximum likelihood estimate for the prior probabilities. Compare the result with the estimates that you obtained last week. If they are the same, interpret what you did last week. If they are diﬀerent, explain the diﬀerences.
1

Using the training data in TrainingSamplesDCT 8.mat, compute the maximum likelihood estimates for the parameters of the class conditional densities P_{XY} (xcheetah) and P_{XY} (xgrass) under the Gaussian assumption. Denoting by X = {X_{1}, . . . , X_{64}} the vector of DCT coeﬃcients, create 64 plots with the marginal densities for the two classes – P_{X}_{k}_{Y} (x_{k}cheetah) and P_{X}_{k}_{Y} (x_{k}grass), k = 1, . . . , 64 – on each. Use diﬀerent line styles for each marginal. Select, by visual inspection, what you think are the best 8 features for classification purposes and what you think are the worst 8 features (you can use the subplot command to compare several plots at a time). Hand in the plots of the marginal densities for the best8 and worst8 features (once again you can use subplot, this should not require more than two sheets of paper). In each subplot indicate the feature that it refers to.

Compute the Bayesian decision rule and classify the locations of the cheetah image using i) the 64dimensional Gaussians, and ii) the 8dimensional Gaussians associated with the best 8 features. For the two cases, plot the classification masks and compute the probability of error by comparing with cheetah mask.bmp. Can you explain the results?
2