Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding
Lvmin Zhang, Shengqu Cai, Muyang Li, Chong Zeng, Beijia Lu, Anyi Rao, Song Han, Gordon Wetzstein, and Maneesh Agrawala

TL;DR
This paper introduces a lightweight, pretrained history encoder for autoregressive video generation that efficiently encodes long video histories into short embeddings, maintaining content consistency with reduced computational resources.
Contribution
A novel pretrained frame query-based encoder for long video histories that enables efficient autoregressive video generation on limited hardware.
Findings
Embeddings achieve comparable performance to heavier models.
Pretraining with frame query objective improves history coverage.
Effective for personal and local workflow applications.
Abstract
Autoregressive video generation relies on history context for content consistency and storytelling. As video histories grow longer, efficiently encoding them remains an open problem - particularly for personal users and local workflows where compute and memory budgets are limited. We present a lightweight history encoder that maps long video histories into short-length embeddings, pretrained with a frame query objective that learns to attend to content features at arbitrary temporal positions. The pretraining stage provides the encoder with dense history coverage on large-scale video data; the subsequent finetuning stage adapts the pretrained encoder under an autoregressive video generation objective to establish content-level consistency. In this way, the lightweight embeddings achieve comparable performance to heavier alternatives. We evaluate the framework with ablative settings and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies · Advanced Data Compression Techniques
