Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance

Dongmin Park; Sebin Kim; Taehong Moon; Minkyu Kim; Kangwook Lee; Jaewoong Cho

arXiv:2410.22376·cs.LG·September 30, 2025

Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance

Dongmin Park, Sebin Kim, Taehong Moon, Minkyu Kim, Kangwook Lee, Jaewoong Cho

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces R2F, a training-free method that enhances diffusion models' ability to generate rare concept compositions by leveraging LLM guidance, significantly improving accuracy and alignment in text-to-image synthesis.

Contribution

The paper presents a novel, training-free approach called R2F that uses LLM guidance to improve rare concept generation in diffusion models without additional training.

Findings

01

R2F outperforms existing models like SD3.0 and FLUX by up to 28.1% in T2I alignment.

02

Empirical and theoretical analysis shows frequent concept exposure improves rare concept composition.

03

R2F is flexible and can be integrated with various diffusion and LLM models.

Abstract

State-of-the-art text-to-image (T2I) diffusion models often struggle to generate rare compositions of concepts, e.g., objects with unusual attributes. In this paper, we show that the compositional generation power of diffusion models on such rare concepts can be significantly enhanced by the Large Language Model (LLM) guidance. We start with empirical and theoretical analysis, demonstrating that exposing frequent concepts relevant to the target rare concepts during the diffusion sampling process yields more accurate concept composition. Based on this, we propose a training-free approach, R2F, that plans and executes the overall rare-to-frequent concept guidance throughout the diffusion inference by leveraging the abundant semantic knowledge in LLMs. Our framework is flexible across any pre-trained diffusion models and LLMs, and can be seamlessly integrated with the region-guided…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 8Confidence 4

Strengths

1) The proposed rare-to-frequent prompt rewrite is novel and effective in terms of generating rare-concept-images. 2) The empirical results looks promising. 3) Solid empirical results are provided to validate the effectiveness of the method. 4) A new benchmark, RareBench, is provided to facilitate research in the task of rare-concept-image-generation. 5) Code and detailed implementation is provided to ensure the reproducibility of the method.

Weaknesses

(1) The method requires alternating among a set of prompts during denoising process, which makes multiple step inference inevitable. Therefore, this design might not work well with current state-of-the-art acceleration methods, which reduce the number of denoising steps to 4 steps or even less. (2) There is a small gap between the theoretical analysis and the empirical method. For the theoretical analysis, the author study the scenarios of linearly interpolation of scores produced by different

Reviewer 02Rating 8Confidence 4

Strengths

1. The observation in alternating prompts in diffusion-based models are important. 2. Both global and region-based generation are proposed. 3. Detailed visualization are provided.

Weaknesses

1. The current design for the scheduling of the selection of frequent and rare composition of concepts is a bit ad-hoc. You always use frequent composition at the begining and then start randomly selection of composition after a fixed point. Based on your theoretical analysis, any additional guidance can be included or used to determine the selection of composition of concepts? 2. From your example, each rare composition has only two concept. How do you generalize your approach to more complicat

Reviewer 03Rating 6Confidence 4

Strengths

- This paper is well-written and easy to follow. - The method is training-free. Experimental results show that R2F outperforms previous models on various metrics. - It brings a new task to compositional generation or text-to-image generation.

Weaknesses

- For applications, rare concept composition generation is still a relatively niche area, although I acknowledge that it is indeed a novel task within compositional generation. Have you considered exploring a broader range of application scenarios? - For the computational cost, this paper adopts an approach similar to LMD to enhance R2F, resulting in R2F+, which involves substantial latent and gradient computations. A detailed comparison of computational and memory overhead with other methods is

Code & Models

Repositories

krafton-ai/rare-to-frequent
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNuclear reactor physics and engineering

MethodsDiffusion