I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt   Generation for Text-Guided Multi-Mask Inpainting

Nicola Fanelli; Gennaro Vessio; Giovanna Castellano

arXiv:2411.19050·cs.CV·April 11, 2025

I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Nicola Fanelli, Gennaro Vessio, Giovanna Castellano

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel multi-mask inpainting framework that uses multimodal LLMs to generate prompts for multiple regions, combined with diffusion models for detailed, text-guided image restoration.

Contribution

It proposes a new multi-mask inpainting task and a method to automatically generate prompts using fine-tuned multimodal LLMs, enhancing inpainting accuracy and creativity.

Findings

01

Effective multi-mask inpainting on artistic images

02

Generated prompts improve inpainting quality

03

Method outperforms existing diffusion-based inpainting approaches

Abstract

Inpainting focuses on filling missing or corrupted regions of an image to blend seamlessly with its surrounding content and style. While conditional diffusion models have proven effective for text-guided inpainting, we introduce the novel task of multi-mask inpainting, where multiple regions are simultaneously inpainted using distinct prompts. Furthermore, we design a fine-tuning procedure for multimodal LLMs, such as LLaVA, to generate multi-mask prompts automatically using corrupted images as inputs. These models can generate helpful and detailed prompt suggestions for filling the masked regions. The generated prompts are then fed to Stable Diffusion, which is fine-tuned for the multi-mask inpainting problem using rectified cross-attention, enforcing prompts onto their designated regions for filling. Experiments on digitized paintings from WikiArt and the Densely Captioned Images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cilabuniba/i-dream-my-painting
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation

MethodsInpainting · Diffusion