- [8 points] Generative Adversarial Network (GAN)
(a) What is the cost function for classical GANs? Use Dw (x) as the discriminator and Gθ (z)
as the generator, where the generator transforms z ∼ Z to x ∈ X .
(b) Assume arbitrary capacity for both discriminator and generator. In this case we refer to the discriminator using D(x), and denote the distribution on the data domain induced by the generator via pG (x). State an equivalent problem to the one asked for in part (a), by using pG (x) and the ground truth data distribution pdata (x).
(c) Assuming arbitrary capacity, derive the optimal discriminator D∗ (x) in terms of pdata (x)
and pG (x).
You may need the Euler-Lagrange equation:
∂L(x, D, D˙ )
d ∂L(x, D, D˙ )
∂D − dx
∂D˙ = 0
(d) Assume arbitrary capacity and an optimal discriminator D∗ (x), show that the optimal
generator, G∗ (x), generates the distribution p∗
= pdata , where pdata (x) is the data
You may need the Jensen-Shannon divergence:
1 1 1
JSD(pdata , pG ) = 2 DK L (pdata , M ) + 2 DK L(pG , M ) with M = 2 (pdata + pG )
(e) More recently, researchers have proposed to use the Wasserstein distance instead of di- vergences to train the models since the KL divergence often fails to give meaningful information for training. Consider three distributions, P1 ∼ U [0, 1], P2 ∼ U [0.5, 1.5], and P3 ∼ U [1, 2]. Calculate DK L (P1, P2), DK L(P1 , P3), W1 (P1, P2), and W1(P1, P3), where W1 is the Wasserstein-1 distance between distributions.