CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency,   Controllability and Compatibility

Bojia Zi; Shihao Zhao; Xianbiao Qi; Jianan Wang; Yukai Shi; Qianyu; Chen; Bin Liang; Kam-Fai Wong; Lei Zhang

arXiv:2403.12035·cs.CV·March 19, 2024·1 cites

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu, Chen, Bin Liang, Kam-Fai Wong, Lei Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CoCoCo, a novel text-guided video inpainting model that enhances consistency, controllability, and compatibility by integrating motion capture, region selection, and personalized model injection, leading to high-quality, coherent video generation.

Contribution

The paper presents a new text-guided video inpainting approach with a motion capture module, instance-aware region selection, and personalized model integration, addressing key challenges in the field.

Findings

01

Improved motion consistency in generated videos

02

Enhanced textual controllability over inpainted regions

03

Better model compatibility with personalized models

Abstract

Recent advancements in video generation have been remarkable, yet many existing methods struggle with issues of consistency and poor text-video alignment. Moreover, the field lacks effective techniques for text-guided video inpainting, a stark contrast to the well-explored domain of text-guided image inpainting. To this end, this paper proposes a novel text-guided video inpainting model that achieves better consistency, controllability and compatibility. Specifically, we introduce a simple but efficient motion capture module to preserve motion consistency, and design an instance-aware region selection instead of a random region selection to obtain better textual controllability, and utilize a novel strategy to inject some personalized models into our CoCoCo model and thus obtain better model compatibility. Extensive experiments show that our model can generate high-quality video clips.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zibojia/COCOCO
pytorchOfficial

Videos

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Law in Society and Culture

MethodsInpainting