ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer

Ruonan Yu; Zhenxiong Tan; Zigeng Chen; Songhua Liu; Xinchao Wang

arXiv:2603.15478·cs.CV·March 17, 2026

ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer

Ruonan Yu, Zhenxiong Tan, Zigeng Chen, Songhua Liu, Xinchao Wang

PDF

Open Access

TL;DR

ViFeEdit introduces a novel video-free tuning method for video diffusion transformers that enables controllable video generation and editing using only 2D images, reducing data and computational requirements.

Contribution

The paper proposes a new architectural reparameterization and dual-path pipeline for video diffusion transformers, enabling effective video editing without video training data.

Findings

01

Achieves versatile video generation and editing with minimal 2D image data.

02

Maintains temporal consistency in edited videos.

03

Operates with only a small number of additional parameters.

Abstract

Diffusion Transformers (DiTs) have demonstrated remarkable scalability and quality in image and video generation, prompting growing interest in extending them to controllable generation and editing tasks. However, compared to the image counterparts, progress in video control and editing remains limited, mainly due to the scarcity of paired video data and the high computational cost of training video diffusion models. To address this issue, in this paper, we propose a video-free tuning framework termed ViFeEdit for video diffusion transformers. Without requiring any forms of video training data, ViFeEdit achieves versatile video generation and editing, adapted solely with 2D images. At the core of our approach is an architectural reparameterization that decouples spatial independence from the full 3D attention in modern video diffusion transformers, which enables visually faithful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Computer Graphics and Visualization Techniques