I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting
Nicola Fanelli, Gennaro Vessio, Giovanna Castellano

TL;DR
This paper introduces a novel multi-mask inpainting framework that uses multimodal LLMs to generate prompts for multiple regions, combined with diffusion models for detailed, text-guided image restoration.
Contribution
It proposes a new multi-mask inpainting task and a method to automatically generate prompts using fine-tuned multimodal LLMs, enhancing inpainting accuracy and creativity.
Findings
Effective multi-mask inpainting on artistic images
Generated prompts improve inpainting quality
Method outperforms existing diffusion-based inpainting approaches
Abstract
Inpainting focuses on filling missing or corrupted regions of an image to blend seamlessly with its surrounding content and style. While conditional diffusion models have proven effective for text-guided inpainting, we introduce the novel task of multi-mask inpainting, where multiple regions are simultaneously inpainted using distinct prompts. Furthermore, we design a fine-tuning procedure for multimodal LLMs, such as LLaVA, to generate multi-mask prompts automatically using corrupted images as inputs. These models can generate helpful and detailed prompt suggestions for filling the masked regions. The generated prompts are then fed to Stable Diffusion, which is fine-tuned for the multi-mask inpainting problem using rectified cross-attention, enforcing prompts onto their designated regions for filling. Experiments on digitized paintings from WikiArt and the Densely Captioned Images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
MethodsInpainting · Diffusion
