M3DDM+: An improved video outpainting by a modified masking strategy
Takuya Murakawa, Takumi Fukuzawa, Ning Ding, Toru Tamaki

TL;DR
M3DDM+ enhances video outpainting quality and temporal consistency by aligning training masking strategies with inference requirements, especially in challenging scenarios with limited motion or large outpainting regions.
Contribution
It introduces a modified masking strategy and fine-tuning process to improve video outpainting quality and coherence in latent diffusion models.
Findings
Significant improvement in visual fidelity and temporal coherence.
Maintains computational efficiency.
Effective in scenarios with limited inter-frame information.
Abstract
M3DDM provides a computationally efficient framework for video outpainting via latent diffusion modeling. However, it exhibits significant quality degradation -- manifested as spatial blur and temporal inconsistency -- under challenging scenarios characterized by limited camera motion or large outpainting regions, where inter-frame information is limited. We identify the cause as a training-inference mismatch in the masking strategy: M3DDM's training applies random mask directions and widths across frames, whereas inference requires consistent directional outpainting throughout the video. To address this, we propose M3DDM+, which applies uniform mask direction and width across all frames during training, followed by fine-tuning of the pretrained M3DDM model. Experiments demonstrate that M3DDM+ substantially improves visual fidelity and temporal coherence in information-limited scenarios…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment · Image Enhancement Techniques
