ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer
Ruonan Yu, Zhenxiong Tan, Zigeng Chen, Songhua Liu, Xinchao Wang

TL;DR
ViFeEdit introduces a novel video-free tuning method for video diffusion transformers that enables controllable video generation and editing using only 2D images, reducing data and computational requirements.
Contribution
The paper proposes a new architectural reparameterization and dual-path pipeline for video diffusion transformers, enabling effective video editing without video training data.
Findings
Achieves versatile video generation and editing with minimal 2D image data.
Maintains temporal consistency in edited videos.
Operates with only a small number of additional parameters.
Abstract
Diffusion Transformers (DiTs) have demonstrated remarkable scalability and quality in image and video generation, prompting growing interest in extending them to controllable generation and editing tasks. However, compared to the image counterparts, progress in video control and editing remains limited, mainly due to the scarcity of paired video data and the high computational cost of training video diffusion models. To address this issue, in this paper, we propose a video-free tuning framework termed ViFeEdit for video diffusion transformers. Without requiring any forms of video training data, ViFeEdit achieves versatile video generation and editing, adapted solely with 2D images. At the core of our approach is an architectural reparameterization that decouples spatial independence from the full 3D attention in modern video diffusion transformers, which enables visually faithful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Computer Graphics and Visualization Techniques
