Video Occupancy Models

Manan Tomar; Philippe Hansen-Estruch; Philip Bachman; Alex Lamb; John; Langford; Matthew E. Taylor; Sergey Levine

arXiv:2407.09533·cs.CV·July 16, 2024

Video Occupancy Models

Manan Tomar, Philippe Hansen-Estruch, Philip Bachman, Alex Lamb, John, Langford, Matthew E. Taylor, Sergey Levine

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

Video Occupancy models (VOCs) are a new class of video prediction models that operate in a compact latent space and directly predict future state distributions, improving efficiency for control tasks.

Contribution

VOCs introduce a novel approach by predicting future states in a single step within a latent space, avoiding multistep roll-outs and enhancing predictive accuracy for control applications.

Findings

01

VOCs operate efficiently in a compact latent space.

02

VOCs outperform prior models in predictive accuracy.

03

Code is publicly available for reproducibility.

Abstract

We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding the need for multistep roll-outs. We show that both properties are beneficial when building predictive models of video for use in downstream control. Code is available at \href{https://github.com/manantomar/video-occupancy-models}{\texttt{github.com/manantomar/video-occupancy-models}}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manantomar/video-occupancy-models
pytorchOfficial

Models

🤗
manantomar/video-occupancy-models
model

Datasets

manantomar/video-occupancy-models-datasets
dataset· 38 dl
38 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment