UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu, Liu, Jiang Bian

TL;DR
UniEdit is a tuning-free, unified framework for video editing that effectively handles both motion and appearance modifications using a pre-trained text-to-video generator, advancing the capabilities of video editing technology.
Contribution
The paper introduces UniEdit, a novel framework that enables simultaneous video motion and appearance editing without tuning, utilizing a pre-trained generator and auxiliary branches for feature injection.
Findings
Outperforms state-of-the-art video editing methods
Supports diverse motion and appearance editing scenarios
Demonstrates effective preservation of source content
Abstract
Recent advances in text-guided video editing have showcased promising results in appearance editing (e.g., stylization). However, video motion editing in the temporal dimension (e.g., from eating to waving), which distinguishes video editing from image editing, is underexplored. In this work, we present UniEdit, a tuning-free framework that supports both video motion and appearance editing by harnessing the power of a pre-trained text-to-video generator within an inversion-then-generation framework. To realize motion editing while preserving source video content, based on the insights that temporal and spatial self-attention layers encode inter-frame and intra-frame dependency respectively, we introduce auxiliary motion-reference and reconstruction branches to produce text-guided motion and source features respectively. The obtained features are then injected into the main editing path…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
