Uni-Animator: Towards Unified Visual Colorization

Xinyuan Chen; Yao Xu; Shaowen Wang; Pengjie Song; Bowen Deng

arXiv:2602.23191·cs.CV·March 4, 2026

Uni-Animator: Towards Unified Visual Colorization

Xinyuan Chen, Yao Xu, Shaowen Wang, Pengjie Song, Bowen Deng

PDF

Open Access

TL;DR

Uni-Animator is a unified diffusion transformer framework that significantly improves image and video sketch colorization by enhancing reference alignment, detail preservation, and temporal coherence.

Contribution

It introduces a novel unified approach with reference enhancement, physical detail reinforcement, and motion-aware encoding for improved colorization across domains.

Findings

01

Achieves competitive results on both image and video sketch colorization.

02

Effectively preserves high-frequency physical details.

03

Maintains robust temporal consistency in videos.

Abstract

We propose Uni-Animator, a novel Diffusion Transformer (DiT)-based framework for unified image and video sketch colorization. Existing sketch colorization methods struggle to unify image and video tasks, suffering from imprecise color transfer with single or multiple references, inadequate preservation of high-frequency physical details, and compromised temporal coherence with motion artifacts in large-motion scenes. To tackle imprecise color transfer, we introduce visual reference enhancement via instance patch embedding, enabling precise alignment and fusion of reference color information. To resolve insufficient physical detail preservation, we design physical detail reinforcement using physical features that effectively capture and retain high-frequency textures. To mitigate motion-induced temporal inconsistency, we propose sketch-based dynamic RoPE encoding that adaptively models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Face recognition and analysis