Generating Videos

  with Scene Dynamics

Chaoran Huang,

Generating Videos with Scene Dynamics

NIPS 2016

Carl Vondrick

rich predictive models

computer vision and machine learning

Ph.D. Student, MIT

Generative Adversarial Networks


Spatio-temporal 3D Models


Discriminator Network (D)

Video Generator Network (G)


apply SGD on:

Fig1. - Discriminator Network

One Stream Architecture


Consistent in both time and space

Low dimension input, high dimension output

Only object moves

Two Stream Architecture


Enforced static background (picture )

Moving foreground

Summarize with mask

Fig2. - Video Generator Network
Fig3. - A example of two stream architecture


2 years of Flickr Videos

9 TB, 35 million clips

5,000+ hours length

26 TB raw data