MAVIN: Multi-Action Video Generation with Diffusion Models via   Transition Video Infilling

Bowen Zhang; Xiaofei Xie; Haotian Lu; Na Ma; Tianlin Li; Qing Guo

arXiv:2405.18003·cs.CV·May 29, 2024·1 cites

MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling

Bowen Zhang, Xiaofei Xie, Haotian Lu, Na Ma, Tianlin Li, Qing Guo

PDF

Open Access 1 Repo

TL;DR

MAVIN is a diffusion-based model that generates seamless transition videos between two given segments, addressing challenges in multi-action video generation by focusing on smoothness, coherence, and long-term consistency.

Contribution

MAVIN introduces innovative techniques like boundary frame guidance and Gaussian filter mixer for effective transition video infilling, along with a new metric for evaluating temporal smoothness.

Findings

01

MAVIN outperforms existing methods in generating smooth transition videos.

02

The model effectively handles large infilling gaps and varied transition lengths.

03

Experimental results demonstrate superior temporal coherence and visual quality.

Abstract

Diffusion-based video generation has achieved significant progress, yet generating multiple actions that occur sequentially remains a formidable task. Directly generating a video with sequential actions can be extremely challenging due to the scarcity of fine-grained action annotations and the difficulty in establishing temporal semantic correspondences and maintaining long-term consistency. To tackle this, we propose an intuitive and straightforward solution: splicing multiple single-action video segments sequentially. The core challenge lies in generating smooth and natural transitions between these segments given the inherent complexity and variability of action transitions. We introduce MAVIN (Multi-Action Video INfilling model), designed to generate transition videos that seamlessly connect two given videos, forming a cohesive integrated sequence. MAVIN incorporates several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

18445864529/mavin
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Video Coding and Compression Technologies