Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion
Yangfan He, Sida Li, Jianhui Wang, Kun Li, Xinyuan Song, Xinhang Yuan, Keqin Li, Kuan Lu, Menghao Huo, Jingqun Tang, Yi Xin, Jiaqi Chen, Miao Zhang, Xueqian Wang

TL;DR
This paper introduces a lightweight adapter framework that improves temporal and spatial consistency in diffusion-based video editing by integrating novel modules for frame, spatial, and semantic coherence, without extensive retraining.
Contribution
The proposed GE-Adapter combines three innovative modules to enhance temporal and spatial coherence in low-cost diffusion-based video editing, reducing training costs and improving quality.
Findings
Significantly improves perceptual quality and temporal coherence.
Enhances text-image alignment and frame-to-frame consistency.
Achieves better fidelity in video editing tasks.
Abstract
Recent advancements in text-to-image (T2I) generation using diffusion models have enabled cost-effective video-editing applications by leveraging pre-trained models, eliminating the need for resource-intensive training. However, the frame-independence of T2I generation often results in poor temporal consistency. Existing methods address this issue through temporal layer fine-tuning or inference-based temporal propagation, but these approaches suffer from high training costs or limited temporal coherence. To address these challenges, we propose a General and Efficient Adapter (GE-Adapter) that integrates temporal-spatial and semantic consistency with Baliteral DDIM inversion. This framework introduces three key components: (1) Frame-based Temporal Consistency Blocks (FTC Blocks) to capture frame-specific features and enforce smooth inter-frame transitions via temporally-aware loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Video Coding and Compression Technologies
MethodsDiffusion · Adapter
