 Write out the steps in the REINFORCE algorithm,
making sure to define any symbols you use.
 In the context of Deep QLearning, explain the following:
 Experience Replay
 Double QLearning
 What is the Energy function for these architectures:
 Boltzmann Machine
 Restricted Boltzmann Machine
Remember to define any variables you use.

The Variational AutoEncoder is trained to maximize
E_{z ∼ qφ(z  x(i))}
[log p_{θ}(x^{(i)}  z)]
– D_{KL}(q_{φ}(z  x^{(i)}) 
p(z))
Briefly state what each of these two terms aims to achieve.
 Generative Adversarial Networks traditionally made use of a twoplayer
zerosum game between a Generator G_{θ} and a
Discriminator D_{ψ}, to compute
min_{θ} max_{ψ} (V(G_{θ}, D_{ψ}))
 Give the formula for V(G_{θ}, D_{ψ}).
 Explain why it may be advantageous to change the GAN algorithm
so that the game is no longer zerosum, and write the formula
that the Generator would try to maximize in that case.

In the context of GANs, briefly explain what is meant by
mode collapse,
and list three different methods for avoiding it.