OutDreamer: Video Outpainting with a Diffusion Transformer
Linhao Zhong, Fan Li, Yi Huang, Jianzhuang Liu, Renjing Pei, Fenglong Song

TL;DR
OutDreamer is a novel diffusion transformer-based framework for video outpainting that ensures high-quality, temporally consistent extended videos through a mask-driven self-attention mechanism and a latent alignment loss.
Contribution
The paper introduces OutDreamer, combining a diffusion transformer with a new self-attention layer and loss function for improved video outpainting performance.
Findings
OutDreamer outperforms existing zero-shot methods on benchmark datasets.
The model maintains high spatial and temporal consistency in generated videos.
Extensive evaluations confirm the effectiveness of the proposed components.
Abstract
Video outpainting is a challenging task that generates new video content by extending beyond the boundaries of an original input video, requiring both temporal and spatial consistency. Many state-of-the-art methods utilize latent diffusion models with U-Net backbones but still struggle to achieve high quality and adaptability in generated content. Diffusion transformers (DiTs) have emerged as a promising alternative because of their superior performance. We introduce OutDreamer, a DiT-based video outpainting framework comprising two main components: an efficient video control branch and a conditional outpainting branch. The efficient video control branch effectively extracts masked video information, while the conditional outpainting branch generates missing content based on these extracted conditions. Additionally, we propose a mask-driven self-attention layer that dynamically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Video Analysis and Summarization
