This is an optional quiz to test your understanding of Deep RL and Unsupervised Learning.

Write out the steps in the REINFORCE algorithm, making sure to define any symbols you use.
In the context of Deep Q-Learning, explain the following:
1. Experience Replay
2. Double Q-Learning
What is the Energy function for these architectures:
1. Boltzmann Machine
2. Restricted Boltzmann Machine
Remember to define any variables you use.
The Variational Auto-Encoder is trained to maximize
E_{z ∼ q_φ(z | x⁽ⁱ⁾)} [log p_θ(x⁽ⁱ⁾ | z)] – D_KL(q_φ(z | x⁽ⁱ⁾) || p(z))
Briefly state what each of these two terms aims to achieve.
Generative Adversarial Networks traditionally made use of a two-player zero-sum game between a Generator G_θ and a Discriminator D_ψ, to compute
min_θ max_ψ (V(G_θ, D_ψ))
1. Give the formula for V(G_θ, D_ψ).
2. Explain why it may be advantageous to change the GAN algorithm so that the game is no longer zero-sum, and write the formula that the Generator would try to maximize in that case.
In the context of GANs, briefly explain what is meant by mode collapse, and list three different methods for avoiding it.