Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence

Shuai Yang; Junxin Lin; Yifan Zhou; Ziwei Liu; Chen Change Loy

arXiv:2512.03905·cs.CV·December 4, 2025

Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence

Shuai Yang, Junxin Lin, Yifan Zhou, Ziwei Liu, Chen Change Loy

PDF

Open Access

TL;DR

FRESCO is a novel framework that improves zero-shot video translation and editing by integrating intra- and inter-frame correspondence, ensuring high spatial-temporal consistency and visual coherence in manipulated videos.

Contribution

The paper introduces FRESCO, a new method that explicitly optimizes features for zero-shot video translation and editing, surpassing existing attention-based approaches in temporal consistency.

Findings

01

FRESCO achieves superior spatial-temporal consistency in manipulated videos.

02

The method outperforms current zero-shot techniques in video quality and coherence.

03

Experiments confirm the effectiveness of FRESCO on video translation and editing tasks.

Abstract

The remarkable success in text-to-image diffusion models has motivated extensive investigation of their potential for video applications. Zero-shot techniques aim to adapt image diffusion models for videos without requiring further model training. Recent methods largely emphasize integrating inter-frame correspondence into attention mechanisms. However, the soft constraint applied to identify the valid features to attend is insufficient, which could lead to temporal inconsistency. In this paper, we present FRESCO, which integrates intra-frame correspondence with inter-frame correspondence to formulate a more robust spatial-temporal constraint. This enhancement ensures a consistent transformation of semantically similar content between frames. Our method goes beyond attention guidance to explicitly optimize features, achieving high spatial-temporal consistency with the input video,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image Processing Techniques