TL;DR
Uni-paint introduces a versatile multimodal image inpainting framework leveraging pretrained diffusion models, enabling various guidance modes without task-specific training, thus enhancing control and generalizability in image editing.
Contribution
It presents a unified, multimodal inpainting framework based on pretrained diffusion models that requires no task-specific training, expanding control over image editing.
Findings
Achieves comparable results to single-modal methods
Supports multiple guidance modes including text, strokes, and exemplars
Demonstrates strong few-shot generalization capabilities
Abstract
Recently, text-to-image denoising diffusion probabilistic models (DDPMs) have demonstrated impressive image generation capabilities and have also been successfully applied to image inpainting. However, in practice, users often require more control over the inpainting process beyond textual guidance, especially when they want to composite objects with customized appearance, color, shape, and layout. Unfortunately, existing diffusion-based inpainting methods are limited to single-modal guidance and require task-specific training, hindering their cross-modal scalability. To address these limitations, we propose Uni-paint, a unified framework for multimodal inpainting that offers various modes of guidance, including unconditional, text-driven, stroke-driven, exemplar-driven inpainting, as well as a combination of these modes. Furthermore, our Uni-paint is based on pretrained Stable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion · Inpainting
