Consistent Generative Query Networks

Ananya Kumar; S. M. Ali Eslami; Danilo J. Rezende; Marta Garnelo,; Fabio Viola; Edward Lockhart; Murray Shanahan

arXiv:1807.02033·cs.CV·April 23, 2019·22 cites

Consistent Generative Query Networks

Ananya Kumar, S. M. Ali Eslami, Danilo J. Rezende, Marta Garnelo,, Fabio Viola, Edward Lockhart, Murray Shanahan

PDF

Open Access

TL;DR

This paper introduces a novel generative model that efficiently produces temporally consistent video frames at arbitrary time points, overcoming the limitations of autoregressive methods in speed and flexibility.

Contribution

The authors propose a model that generates a latent representation from any set of frames, enabling fast, simultaneous sampling of consistent frames at arbitrary times, including jump-ahead capabilities.

Findings

01

Significant speed improvements over autoregressive models.

02

Ability to sample frames at arbitrary time points, including future and past.

03

Effective in both synthetic video prediction and 3D scene reconstruction tasks.

Abstract

Stochastic video prediction models take in a sequence of image frames, and generate a sequence of consecutive future image frames. These models typically generate future frames in an autoregressive fashion, which is slow and requires the input and output frames to be consecutive. We introduce a model that overcomes these drawbacks by generating a latent representation from an arbitrary set of frames that can then be used to simultaneously and efficiently sample temporally consistent frames at arbitrary time-points. For example, our model can "jump" and directly sample frames at the end of the video, without sampling intermediate frames. Synthetic video evaluations confirm substantial gains in speed and functionality without loss in fidelity. We also apply our framework to a 3D scene reconstruction dataset. Here, our model is conditioned on camera location and can sample consistent sets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Advanced Database Systems and Queries · Data Management and Algorithms

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings