Description
Submission to:
Please submit your homework in pdf format to the CS420 folder in the following FTP. File name should be like this: 0123456789_tushikui_hw1.pdf.
ftp://public.sjtu.edu.cn
username: xingzhihao
password: public
1 (10 points) PCA algorithm
Give at least two algorithms that could take data set X = fx_{1}; : : : ; x_{N} g, x_{t} 2 R^{n} ^{1}; 8t as input, and output the first principal component w. Specify the computational details of the algorithms, and discuss the advantages or limitations of the algorithms.
2 (10 points) Factor Analysis (FA)
Calculate the Bayesian posterior p(yjx) of the Factor Analysis model x = Ay + + e, with p(xjy) = G(xjAy + ; _{e}), p(y) = G(yj0; _{y}), where G(zj ; ) denotes Gaussian distribution density with mean and covariance matrix .

3
(10 points) Independent Component Analysis (ICA)
Explain why maximizing nonGaussianity could be used as a principle for ICA estimation.
4
(50 points) Dimensionality Reduction by FA
Consider the following Factor Analysis (FA) model,
x = Ay + + e;
(1)
p(xjy) = G(xjAy + ; ^{2}I);
(2)
p(y) = G(yj0; I);
(3)
where the observed variable x 2 R^{n}, the latent variable y 2 R^{m}, and G(zj ; ) denotes Gaussian distribution density with mean and covariance matrix . Write a report on experimental comparisons on model selection performance by BIC, AIC on selecting the number of latent factors, i.e., dim(y) = m.
tushikui@sjtu.edu.cn
Specifically, you need to randomly generate datasets based on FA, by varying some setting values, e.g., sample size N, dimensionality n and m, noise level ^{2}, and so on. For example, set N = 100; n = 10; m = 3; ^{2} = 0:1; = 0, and assign values for A 2 R^{n m}. The generation process is as follows:

Randomly sample a y_{t} from Gaussian density G(yj0; I), with dim(y) = m = 3;

Randomly sample a noise vector e_{t} from Gaussian density G(ej0; ^{2}I), with ^{2} = 0:1,
e_{t} 2 R^{n};

Get x_{t} = Ay_{t} + + e_{t}.
Collect all the x_{t} as the dataset X = fx_{t}g^{N}_{t=1}.
The twostage model selection process for BIC, AIC is as follows:
Stage 1: Run EM algorithm on each dataset X for m = 1; :::; M, and calculate the loglikelihood

^
^
value ln[p(Xj _{m})], where _{m} is the maximum likelihood estimate for parameters;
Stage 2: Select the optimal m by
m = arg max_{m=1;:::;M} J(m);
(4)
^
d_{m}
(5)
J_{AIC} (m) = ln[p(Xj _{k})]
^
ln N
(6)
J_{BIC} (m) = ln[p(Xj _{k})]
d_{m}
2
You may set M = 5, if you generate the dataset X based on n = 10; m = 3.
The following codes might be useful.
Python:
decomposition.FactorAnalysis.html#sklearn.decomposition.FactorAnalysis
5 (20 points) Spectral clustering
Use experiments to demonstrate that when spectral clustering works well, when it would fail.
Summarize your results.
The following codes might be helpful.
Python: https://scikitlearn.org/stable/modules/generated/sklearn.cluster. SpectralClustering.html
2