Low-Cost Test-Time Adaptation for Robust Video Editing

Jianhui Wang; Yinda Chen; Yangfan He; Xinyuan Song; Yi Xin; Dapeng Zhang; Zhongwei Wan; Bin Li; Rongchao Zhang

arXiv:2507.21858·cs.CV·July 30, 2025

Low-Cost Test-Time Adaptation for Robust Video Editing

Jianhui Wang, Yinda Chen, Yangfan He, Xinyuan Song, Yi Xin, Dapeng Zhang, Zhongwei Wan, Bin Li, Rongchao Zhang

PDF

TL;DR

Vid-TTA introduces a lightweight, self-supervised test-time adaptation framework that enhances the temporal consistency and robustness of video editing models without significant computational costs.

Contribution

It proposes a novel motion-aware reconstruction and prompt perturbation strategy combined with meta-learning for dynamic loss balancing, improving video editing quality during inference.

Findings

01

Significantly improves temporal consistency in video editing.

02

Reduces overfitting to simple prompts.

03

Maintains low computational overhead during adaptation.

Abstract

Video editing is a critical component of content creation that transforms raw footage into coherent works aligned with specific visual and narrative objectives. Existing approaches face two major challenges: temporal inconsistencies due to failure in capturing complex motion patterns, and overfitting to simple prompts arising from limitations in UNet backbone architectures. While learning-based methods can enhance editing quality, they typically demand substantial computational resources and are constrained by the scarcity of high-quality annotated data. In this paper, we present Vid-TTA, a lightweight test-time adaptation framework that personalizes optimization for each test video during inference through self-supervised auxiliary tasks. Our approach incorporates a motion-aware frame reconstruction mechanism that identifies and preserves crucial movement regions, alongside a prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.