CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu, Chen, Bin Liang, Kam-Fai Wong, Lei Zhang

TL;DR
This paper introduces CoCoCo, a novel text-guided video inpainting model that enhances consistency, controllability, and compatibility by integrating motion capture, region selection, and personalized model injection, leading to high-quality, coherent video generation.
Contribution
The paper presents a new text-guided video inpainting approach with a motion capture module, instance-aware region selection, and personalized model integration, addressing key challenges in the field.
Findings
Improved motion consistency in generated videos
Enhanced textual controllability over inpainted regions
Better model compatibility with personalized models
Abstract
Recent advancements in video generation have been remarkable, yet many existing methods struggle with issues of consistency and poor text-video alignment. Moreover, the field lacks effective techniques for text-guided video inpainting, a stark contrast to the well-explored domain of text-guided image inpainting. To this end, this paper proposes a novel text-guided video inpainting model that achieves better consistency, controllability and compatibility. Specifically, we introduce a simple but efficient motion capture module to preserve motion consistency, and design an instance-aware region selection instead of a random region selection to obtain better textual controllability, and utilize a novel strategy to inject some personalized models into our CoCoCo model and thus obtain better model compatibility. Extensive experiments show that our model can generate high-quality video clips.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Law in Society and Culture
MethodsInpainting
