Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance

Kelvin C.K. Chan; Yang Zhao; Xuhui Jia; Ming-Hsuan Yang; Huisheng Wang

arXiv:2405.01356·cs.CV·May 3, 2024

Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance

Kelvin C.K. Chan, Yang Zhao, Xuhui Jia, Ming-Hsuan Yang, Huisheng Wang

PDF

Open Access

TL;DR

This paper introduces Subject-Agnostic Guidance (SAG), a novel method that improves subject-driven text-to-image synthesis by balancing the influence of reference images and text prompts, leading to more accurate and consistent outputs.

Contribution

The paper proposes SAG, a simple yet effective guidance technique that enhances subject-driven image synthesis by constructing subject-agnostic conditions and dual classifier-free guidance.

Findings

01

Significant quality improvements in image synthesis results.

02

Effective in both optimization-based and encoder-based methods.

03

Validated through evaluations and user studies.

Abstract

In subject-driven text-to-image synthesis, the synthesis process tends to be heavily influenced by the reference images provided by users, often overlooking crucial attributes detailed in the text prompt. In this work, we propose Subject-Agnostic Guidance (SAG), a simple yet effective solution to remedy the problem. We show that through constructing a subject-agnostic condition and applying our proposed dual classifier-free guidance, one could obtain outputs consistent with both the given subject and input text prompts. We validate the efficacy of our approach through both optimization-based and encoder-based methods. Additionally, we demonstrate its applicability in second-order customization methods, where an encoder-based model is fine-tuned with DreamBooth. Our approach is conceptually simple and requires only minimal code modifications, but leads to substantial quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Image Processing and 3D Reconstruction