ControlCol: Controllability in Automatic Speaker Video Colorization
Rory Ward, John G. Breslin, Peter Corcoran

TL;DR
ControlCol is a new automatic speaker video colorization system that offers user controllability and achieves higher quality than previous methods, validated by quantitative metrics and human preference.
Contribution
It introduces a controllable automatic colorization system for speaker videos that maintains high quality and surpasses state-of-the-art methods.
Findings
ControlCol outperforms DeOldify by 3.5% on key metrics.
Human evaluators prefer ControlCol 90% of the time.
ControlCol achieves higher PSNR, SSIM, FID, and FVD scores.
Abstract
Adding color to black-and-white speaker videos automatically is a highly desirable technique. It is an artistic process that requires interactivity with humans for the best results. Many existing automatic video colorization systems provide little opportunity for the user to guide the colorization process. In this work, we introduce a novel automatic speaker video colorization system which provides controllability to the user while also maintaining high colorization quality relative to state-of-the-art techniques. We name this system ControlCol. ControlCol performs 3.5% better than the previous state-of-the-art DeOldify on the Grid and Lombard Grid datasets when PSNR, SSIM, FID and FVD are used as metrics. This result is also supported by our human evaluation, where in a head-to-head comparison, ControlCol is preferred 90% of the time to DeOldify. Example videos can be seen in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Human Motion and Animation · Handwritten Text Recognition Techniques
MethodsColorization
