Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin, Jooyoung Choi, Heeseung Kim, Sungroh Yoon

TL;DR
This paper introduces Diptych Prompting, a zero-shot method for subject-driven text-to-image generation that uses inpainting with a reference image to achieve precise subject alignment and high-quality results.
Contribution
It presents a novel zero-shot inpainting-based approach that improves subject alignment and detail in text-to-image generation without fine-tuning.
Findings
Outperforms existing zero-shot prompting methods in visual quality.
Supports stylized image generation and editing.
Achieves better subject alignment and detail fidelity.
Abstract
Subject-driven text-to-image generation aims to produce images of a new subject within a desired context by accurately capturing both the visual characteristics of the subject and the semantic content of a text prompt. Traditional methods rely on time- and resource-intensive fine-tuning for subject alignment, while recent zero-shot approaches leverage on-the-fly image prompting, often sacrificing subject alignment. In this paper, we introduce Diptych Prompting, a novel zero-shot approach that reinterprets as an inpainting task with precise subject alignment by leveraging the emergent property of diptych generation in large-scale text-to-image models. Diptych Prompting arranges an incomplete diptych with the reference image in the left panel, and performs text-conditioned inpainting on the right panel. We further prevent unwanted content leakage by removing the background in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · AI in cancer detection · Cell Image Analysis Techniques
MethodsSoftmax · Attention Is All You Need · Inpainting
