Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
Dejia Xu, Yifan Jiang, Chen Huang, Liangchen Song, Thorsten Gernoth,, Liangliang Cao, Zhangyang Wang, Hao Tang

TL;DR
Cavia is a novel framework that enables camera-controllable, multi-view video generation with high spatial and temporal consistency, allowing precise camera motion specification alongside object motion, surpassing previous methods in quality.
Contribution
Introduces Cavia, the first framework for joint camera and object motion control in multi-view video generation with view-integrated attention modules.
Findings
Outperforms state-of-the-art in geometric consistency
Achieves high perceptual quality in generated videos
Supports diverse data sources for training
Abstract
In recent years there have been remarkable breakthroughs in image-to-video generation. However, the 3D consistency and camera controllability of generated frames have remained unsolved. Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene. To address these limitations, we introduce Cavia, a novel framework for camera-controllable, multi-view video generation, capable of converting an input image into multiple spatiotemporally consistent videos. Our framework extends the spatial and temporal attention modules into view-integrated attention modules, improving both viewpoint and temporal consistency. This flexible design allows for joint training with diverse curated data sources, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage and Video Quality Assessment · Video Coding and Compression Technologies · Advanced Optical Imaging Technologies
MethodsSoftmax · Attention Is All You Need
