Plenoptic Video Generation

Xiao Fu; Shitao Tang; Min Shi; Xian Liu; Jinwei Gu; Ming-Yu Liu; Dahua Lin; Chen-Hsuan Lin

arXiv:2601.05239·cs.CV·January 9, 2026

Plenoptic Video Generation

Xiao Fu, Shitao Tang, Min Shi, Xian Liu, Jinwei Gu, Ming-Yu Liu, Dahua Lin, Chen-Hsuan Lin

PDF

Open Access

TL;DR

PlenopticDreamer is a novel framework for multi-view video generation that maintains spatio-temporal coherence and achieves state-of-the-art results through a camera-guided, autoregressive, and self-conditioned approach.

Contribution

It introduces a multi-in-single-out video-conditioned model with adaptive retrieval and advanced training strategies for coherent, high-fidelity plenoptic video re-rendering.

Findings

01

Achieves state-of-the-art performance on Basic and Agibot benchmarks.

02

Demonstrates superior view synchronization and visual fidelity.

03

Supports diverse view transformations in complex scenarios.

Abstract

Camera-controlled generative video re-rendering methods, such as ReCamMaster, have achieved remarkable progress. However, despite their success in single-view setting, these works often struggle to maintain consistency across multi-view scenarios. Ensuring spatio-temporal coherence in hallucinated regions remains challenging due to the inherent stochasticity of generative models. To address it, we introduce PlenopticDreamer, a framework that synchronizes generative hallucinations to maintain spatio-temporal memory. The core idea is to train a multi-in-single-out video-conditioned model in an autoregressive manner, aided by a camera-guided video retrieval strategy that adaptively selects salient videos from previous generations as conditional inputs. In addition, Our training incorporates progressive context-scaling to improve convergence, self-conditioning to enhance robustness against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Video Coding and Compression Technologies