Uni-Animator: Towards Unified Visual Colorization
Xinyuan Chen, Yao Xu, Shaowen Wang, Pengjie Song, Bowen Deng

TL;DR
Uni-Animator is a unified diffusion transformer framework that significantly improves image and video sketch colorization by enhancing reference alignment, detail preservation, and temporal coherence.
Contribution
It introduces a novel unified approach with reference enhancement, physical detail reinforcement, and motion-aware encoding for improved colorization across domains.
Findings
Achieves competitive results on both image and video sketch colorization.
Effectively preserves high-frequency physical details.
Maintains robust temporal consistency in videos.
Abstract
We propose Uni-Animator, a novel Diffusion Transformer (DiT)-based framework for unified image and video sketch colorization. Existing sketch colorization methods struggle to unify image and video tasks, suffering from imprecise color transfer with single or multiple references, inadequate preservation of high-frequency physical details, and compromised temporal coherence with motion artifacts in large-motion scenes. To tackle imprecise color transfer, we introduce visual reference enhancement via instance patch embedding, enabling precise alignment and fusion of reference color information. To resolve insufficient physical detail preservation, we design physical detail reinforcement using physical features that effectively capture and retain high-frequency textures. To mitigate motion-induced temporal inconsistency, we propose sketch-based dynamic RoPE encoding that adaptively models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Face recognition and analysis
