VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization
Zixun Fang, Zhiheng Liu, Kai Zhu, Yu Liu, Ka Leong Cheng, Wei Zhai,, Yang Cao, Zheng-Jun Zha

TL;DR
VanGogh is a novel multimodal diffusion framework for video colorization that enhances temporal consistency, color fidelity, and user control by integrating feature fusion, depth guidance, and artifact mitigation techniques.
Contribution
It introduces a unified multimodal diffusion approach with a Dual Qformer and novel strategies for improved control and artifact reduction in video colorization.
Findings
Achieves superior temporal consistency and color fidelity.
Reduces flickering artifacts and color overflow.
Enables both global and local user control.
Abstract
Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity. Existing video colorization methods often suffer from color bleeding and lack comprehensive control, particularly under complex motion or diverse semantic cues. To this end, we introduce VanGogh, a unified multimodal diffusion-based framework for video colorization. VanGogh tackles these challenges using a Dual Qformer to align and fuse features from multiple modalities, complemented by a depth-guided generation process and an optical flow loss, which help reduce color overflow. Additionally, a color injection strategy and luma channel replacement are implemented to improve generalization and mitigate flickering artifacts. Thanks to this design, users can exercise both global and local control over the generation process, resulting in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Computer Graphics and Visualization Techniques
MethodsColorization · ALIGN
