UNIC: Unified In-Context Video Editing

Zixuan Ye; Xuanhua He; Quande Liu; Qiulin Wang; Xintao Wang; Pengfei Wan; Di Zhang; Kun Gai; Qifeng Chen; Wenhan Luo

arXiv:2506.04216·cs.CV·June 5, 2025

UNIC: Unified In-Context Video Editing

Zixuan Ye, Xuanhua He, Quande Liu, Qiulin Wang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Qifeng Chen, Wenhan Luo

PDF

Open Access

TL;DR

UNIC presents a unified framework for diverse in-context video editing tasks, leveraging token-based input representation and task-aware positional encoding to enable flexible, task-specific editing within a single model.

Contribution

The paper introduces a novel unified in-context video editing framework that eliminates task-specific modules by modeling multiple editing tasks as token sequences with task-aware encoding.

Findings

01

Achieves superior performance across six video editing tasks

02

Supports flexible task composition and emergent abilities

03

Eliminates need for task-specific adapter modules

Abstract

Recent advances in text-to-video generation have sparked interest in generative video editing tasks. Previous methods often rely on task-specific architectures (e.g., additional adapter modules) or dedicated customizations (e.g., DDIM inversion), which limit the integration of versatile editing conditions and the unification of various editing tasks. In this paper, we introduce UNified In-Context Video Editing (UNIC), a simple yet effective framework that unifies diverse video editing tasks within a single model in an in-context manner. To achieve this unification, we represent the inputs of various video editing tasks as three types of tokens: the source video tokens, the noisy video latent, and the multi-modal conditioning tokens that vary according to the specific editing task. Based on this formulation, our key insight is to integrate these three types into a single consecutive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Multimodal Machine Learning Applications