Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Adam R. Kosiorek; Hyunjik Kim; Ingmar Posner; Yee Whye Teh

arXiv:1806.01794·cs.LG·November 22, 2018·94 cites

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Adam R. Kosiorek, Hyunjik Kim, Ingmar Posner, Yee Whye Teh

PDF

Open Access 1 Repo

TL;DR

This paper introduces SQAIR, a deep generative model for videos that can detect, track, and generate moving objects over time, improving upon previous models by handling occlusions and applying to real-world data.

Contribution

SQAIR extends AIR by incorporating temporal dynamics, enabling unsupervised detection, tracking, and future frame generation of moving objects in videos.

Findings

01

SQAIR outperforms AIR in detecting overlapping and occluded objects.

02

SQAIR successfully tracks pedestrians in CCTV footage without supervision.

03

The model can generate realistic future frames based on current observations.

Abstract

We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable deep generative model for videos of moving objects. It can reliably discover and track objects throughout the sequence of frames, and can also generate future frames conditioning on the current frame, thereby simulating expected motion of objects. This is achieved by explicitly encoding object presence, locations and appearances in the latent variables of the model. SQAIR retains all strengths of its predecessor, Attend, Infer, Repeat (AIR, Eslami et. al., 2016), including learning in an unsupervised manner, and addresses its shortcomings. We use a moving multi-MNIST dataset to show limitations of AIR in detecting overlapping or partially occluded objects, and show how SQAIR overcomes them by leveraging temporal consistency of objects. Finally, we also apply SQAIR to real-world pedestrian CCTV data, where it learns to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akosiorek/sqair
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Advanced Neural Network Applications