OutDreamer: Video Outpainting with a Diffusion Transformer

Linhao Zhong; Fan Li; Yi Huang; Jianzhuang Liu; Renjing Pei; Fenglong Song

arXiv:2506.22298·cs.CV·June 30, 2025

OutDreamer: Video Outpainting with a Diffusion Transformer

Linhao Zhong, Fan Li, Yi Huang, Jianzhuang Liu, Renjing Pei, Fenglong Song

PDF

Open Access

TL;DR

OutDreamer is a novel diffusion transformer-based framework for video outpainting that ensures high-quality, temporally consistent extended videos through a mask-driven self-attention mechanism and a latent alignment loss.

Contribution

The paper introduces OutDreamer, combining a diffusion transformer with a new self-attention layer and loss function for improved video outpainting performance.

Findings

01

OutDreamer outperforms existing zero-shot methods on benchmark datasets.

02

The model maintains high spatial and temporal consistency in generated videos.

03

Extensive evaluations confirm the effectiveness of the proposed components.

Abstract

Video outpainting is a challenging task that generates new video content by extending beyond the boundaries of an original input video, requiring both temporal and spatial consistency. Many state-of-the-art methods utilize latent diffusion models with U-Net backbones but still struggle to achieve high quality and adaptability in generated content. Diffusion transformers (DiTs) have emerged as a promising alternative because of their superior performance. We introduce OutDreamer, a DiT-based video outpainting framework comprising two main components: an efficient video control branch and a conditional outpainting branch. The efficient video control branch effectively extracts masked video information, while the conditional outpainting branch generates missing content based on these extracted conditions. Additionally, we propose a mask-driven self-attention layer that dynamically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Video Analysis and Summarization