VALA: Learning Latent Anchors for Training-Free and Temporally Consistent

Zhangkai Wu; Xuhui Fan; Zhongyuan Xie; Kaize Shi; Longbing Cao

arXiv:2510.22970·cs.CV·October 28, 2025

VALA: Learning Latent Anchors for Training-Free and Temporally Consistent

Zhangkai Wu, Xuhui Fan, Zhongyuan Xie, Kaize Shi, Longbing Cao

PDF

TL;DR

VALA introduces a variational alignment module that adaptively selects key frames and compresses their features into semantic anchors, enhancing temporal consistency and efficiency in training-free video editing with diffusion models.

Contribution

It proposes a novel variational framework with contrastive learning to learn meaningful latent anchors for consistent, training-free video editing.

Findings

01

Achieves state-of-the-art inversion fidelity and editing quality.

02

Improves temporal consistency in video editing.

03

Offers enhanced efficiency over prior methods.

Abstract

Recent advances in training-free video editing have enabled lightweight and precise cross-frame generation by leveraging pre-trained text-to-image diffusion models. However, existing methods often rely on heuristic frame selection to maintain temporal consistency during DDIM inversion, which introduces manual bias and reduces the scalability of end-to-end inference. In this paper, we propose~\textbf{VALA} (\textbf{V}ariational \textbf{A}lignment for \textbf{L}atent \textbf{A}nchors), a variational alignment module that adaptively selects key frames and compresses their latent features into semantic anchors for consistent video editing. To learn meaningful assignments, VALA propose a variational framework with a contrastive learning objective. Therefore, it can transform cross-frame latent representations into compressed latent anchors that preserve both content and temporal coherence.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.