Improving Text-guided Object Inpainting with Semantic Pre-inpainting
Yifu Chen, Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, and Zhineng Chen, Tao Mei

TL;DR
This paper introduces a two-stage framework combining semantic pre-inpainting and diffusion-based object generation to improve text-guided object inpainting, achieving better control and quality.
Contribution
The novel CAT-Diffusion framework decomposes inpainting into semantic inference and diffusion-based generation, enhancing controllability and performance over existing methods.
Findings
Outperforms state-of-the-art on OpenImages-V6 and MSCOCO datasets.
Demonstrates improved controllability of generated objects.
Validates effectiveness through extensive evaluations.
Abstract
Recent years have witnessed the success of large text-to-image diffusion models and their remarkable potential to generate high-quality images. The further pursuit of enhancing the editability of images has sparked significant interest in the downstream task of inpainting a novel object described by a text prompt within a designated region in the image. Nevertheless, the problem is not trivial from two aspects: 1) Solely relying on one single U-Net to align text prompt and visual object across all the denoising timesteps is insufficient to generate desired objects; 2) The controllability of object generation is not guaranteed in the intricate sampling space of diffusion model. In this paper, we propose to decompose the typical single-stage object inpainting into two cascaded processes: 1) semantic pre-inpainting that infers the semantic features of desired objects in a multi-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
MethodsMax Pooling · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Diffusion · Adapter · Concatenated Skip Connection · Inpainting · U-Net · ALIGN
