Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion

Yangfan He; Sida Li; Jianhui Wang; Kun Li; Xinyuan Song; Xinhang Yuan; Keqin Li; Kuan Lu; Menghao Huo; Jingqun Tang; Yi Xin; Jiaqi Chen; Miao Zhang; Xueqian Wang

arXiv:2501.04606·cs.CV·June 12, 2025·2 cites

Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion

Yangfan He, Sida Li, Jianhui Wang, Kun Li, Xinyuan Song, Xinhang Yuan, Keqin Li, Kuan Lu, Menghao Huo, Jingqun Tang, Yi Xin, Jiaqi Chen, Miao Zhang, Xueqian Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a lightweight adapter framework that improves temporal and spatial consistency in diffusion-based video editing by integrating novel modules for frame, spatial, and semantic coherence, without extensive retraining.

Contribution

The proposed GE-Adapter combines three innovative modules to enhance temporal and spatial coherence in low-cost diffusion-based video editing, reducing training costs and improving quality.

Findings

01

Significantly improves perceptual quality and temporal coherence.

02

Enhances text-image alignment and frame-to-frame consistency.

03

Achieves better fidelity in video editing tasks.

Abstract

Recent advancements in text-to-image (T2I) generation using diffusion models have enabled cost-effective video-editing applications by leveraging pre-trained models, eliminating the need for resource-intensive training. However, the frame-independence of T2I generation often results in poor temporal consistency. Existing methods address this issue through temporal layer fine-tuning or inference-based temporal propagation, but these approaches suffer from high training costs or limited temporal coherence. To address these challenges, we propose a General and Efficient Adapter (GE-Adapter) that integrates temporal-spatial and semantic consistency with Baliteral DDIM inversion. This framework introduces three key components: (1) Frame-based Temporal Consistency Blocks (FTC Blocks) to capture frame-specific features and enforce smooth inter-frame transitions via temporally-aware loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

codepassionor/T2I_Adapter
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Video Coding and Compression Technologies

MethodsDiffusion · Adapter