Region-Constraint In-Context Generation for Instructional Video Editing

Zhongwei Zhang; Fuchen Long; Wei Li; Zhaofan Qiu; Wu Liu; Ting Yao; Tao Mei

arXiv:2512.17650·cs.CV·December 22, 2025

Region-Constraint In-Context Generation for Instructional Video Editing

Zhongwei Zhang, Fuchen Long, Wei Li, Zhaofan Qiu, Wu Liu, Ting Yao, Tao Mei

PDF

Open Access 1 Models 4 Datasets

TL;DR

ReCo introduces a region-constraint in-context generation method for instructional video editing, improving editing accuracy and reducing interference through novel regularization techniques and a large-scale dataset.

Contribution

The paper proposes a new constraint modeling approach for video editing, including regularization methods and a large dataset, advancing instruction-based video editing capabilities.

Findings

01

ReCo achieves superior performance on four video editing tasks.

02

The regularization techniques effectively reduce editing errors.

03

The large-scale dataset enhances model training and generalization.

Abstract

The In-context generation paradigm recently has demonstrated strong power in instructional image editing with both data efficiency and synthesis quality. Nevertheless, shaping such in-context learning for instruction-based video editing is not trivial. Without specifying editing regions, the results can suffer from the problem of inaccurate editing regions and the token interference between editing and non-editing areas during denoising. To address these, we present ReCo, a new instructional video editing paradigm that novelly delves into constraint modeling between editing and non-editing regions during in-context generation. Technically, ReCo width-wise concatenates source and target video for joint denoising. To calibrate video diffusion learning, ReCo capitalizes on two regularization terms, i.e., latent and attention regularization, conducting on one-step backward denoised latents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
HiDream-ai/ReCo
model· ♡ 13
♡ 13

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Visual Attention and Saliency Detection