HARP: Autoregressive Latent Video Prediction with High-Fidelity Image   Generator

Younggyo Seo; Kimin Lee; Fangchen Liu; Stephen James; Pieter Abbeel

arXiv:2209.07143·cs.CV·September 16, 2022·1 cites

HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel

PDF

Open Access

TL;DR

HARP introduces a high-fidelity autoregressive latent video prediction model that leverages a VQ-GAN image generator and a causal transformer to produce high-resolution videos with improved quality and efficiency.

Contribution

The paper presents a scalable method combining a VQ-GAN generator with a causal transformer, achieving high-resolution video prediction with fewer parameters and minimal modifications to existing models.

Findings

01

Achieves competitive performance on standard benchmarks.

02

Produces high-resolution (256x256) videos.

03

Uses techniques like top-k sampling and data augmentation.

Abstract

Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics. Recently, autoregressive latent video models have proved to be a powerful video prediction tool, by separating the video prediction into two sub-problems: pre-training an image generator model, followed by learning an autoregressive prediction model in the latent space of the image generator. However, successfully generating high-fidelity and high-resolution videos has yet to be seen. In this work, we investigate how to train an autoregressive latent video prediction model capable of predicting high-fidelity future frames with minimal modification to existing models, and produce high-resolution (256x256) videos. Specifically, we scale up prior models by employing a high-fidelity image generator (VQ-GAN) with a causal transformer model, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications