Video Colorization with Pre-trained Text-to-Image Diffusion Models
Hanyuan Liu, Minshan Xie, Jinbo Xing, Chengze Li, Tien-Tsin Wong

TL;DR
ColorDiffuser leverages pre-trained text-to-image diffusion models with novel techniques to achieve state-of-the-art video colorization, ensuring high color fidelity and temporal consistency.
Contribution
The paper introduces ColorDiffuser, a novel adaptation of pre-trained diffusion models for video colorization with new attention and sampling strategies.
Findings
Achieves state-of-the-art performance on benchmark datasets.
Improves temporal coherence and color vividness.
Outperforms existing methods in color fidelity and visual quality.
Abstract
Video colorization is a challenging task that involves inferring plausible and temporally consistent colors for grayscale frames. In this paper, we present ColorDiffuser, an adaptation of a pre-trained text-to-image latent diffusion model for video colorization. With the proposed adapter-based approach, we repropose the pre-trained text-to-image model to accept input grayscale video frames, with the optional text description, for video colorization. To enhance the temporal coherence and maintain the vividness of colorization across frames, we propose two novel techniques: the Color Propagation Attention and Alternated Sampling Strategy. Color Propagation Attention enables the model to refine its colorization decision based on a reference latent frame, while Alternated Sampling Strategy captures spatiotemporal dependencies by using the next and previous adjacent latent frames…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsDiffusion · Latent Diffusion Model · Colorization
