Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects
Adam R. Kosiorek, Hyunjik Kim, Ingmar Posner, Yee Whye Teh

TL;DR
This paper introduces SQAIR, a deep generative model for videos that can detect, track, and generate moving objects over time, improving upon previous models by handling occlusions and applying to real-world data.
Contribution
SQAIR extends AIR by incorporating temporal dynamics, enabling unsupervised detection, tracking, and future frame generation of moving objects in videos.
Findings
SQAIR outperforms AIR in detecting overlapping and occluded objects.
SQAIR successfully tracks pedestrians in CCTV footage without supervision.
The model can generate realistic future frames based on current observations.
Abstract
We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable deep generative model for videos of moving objects. It can reliably discover and track objects throughout the sequence of frames, and can also generate future frames conditioning on the current frame, thereby simulating expected motion of objects. This is achieved by explicitly encoding object presence, locations and appearances in the latent variables of the model. SQAIR retains all strengths of its predecessor, Attend, Infer, Repeat (AIR, Eslami et. al., 2016), including learning in an unsupervised manner, and addresses its shortcomings. We use a moving multi-MNIST dataset to show limitations of AIR in detecting overlapping or partially occluded objects, and show how SQAIR overcomes them by leveraging temporal consistency of objects. Finally, we also apply SQAIR to real-world pedestrian CCTV data, where it learns to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Advanced Neural Network Applications
