Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park, Sebin Kim, Taehong Moon, Minkyu Kim, Kangwook Lee, Jaewoong Cho

TL;DR
This paper introduces R2F, a training-free method that enhances diffusion models' ability to generate rare concept compositions by leveraging LLM guidance, significantly improving accuracy and alignment in text-to-image synthesis.
Contribution
The paper presents a novel, training-free approach called R2F that uses LLM guidance to improve rare concept generation in diffusion models without additional training.
Findings
R2F outperforms existing models like SD3.0 and FLUX by up to 28.1% in T2I alignment.
Empirical and theoretical analysis shows frequent concept exposure improves rare concept composition.
R2F is flexible and can be integrated with various diffusion and LLM models.
Abstract
State-of-the-art text-to-image (T2I) diffusion models often struggle to generate rare compositions of concepts, e.g., objects with unusual attributes. In this paper, we show that the compositional generation power of diffusion models on such rare concepts can be significantly enhanced by the Large Language Model (LLM) guidance. We start with empirical and theoretical analysis, demonstrating that exposing frequent concepts relevant to the target rare concepts during the diffusion sampling process yields more accurate concept composition. Based on this, we propose a training-free approach, R2F, that plans and executes the overall rare-to-frequent concept guidance throughout the diffusion inference by leveraging the abundant semantic knowledge in LLMs. Our framework is flexible across any pre-trained diffusion models and LLMs, and can be seamlessly integrated with the region-guided…
Peer Reviews
Decision·ICLR 2025 Spotlight
1) The proposed rare-to-frequent prompt rewrite is novel and effective in terms of generating rare-concept-images. 2) The empirical results looks promising. 3) Solid empirical results are provided to validate the effectiveness of the method. 4) A new benchmark, RareBench, is provided to facilitate research in the task of rare-concept-image-generation. 5) Code and detailed implementation is provided to ensure the reproducibility of the method.
(1) The method requires alternating among a set of prompts during denoising process, which makes multiple step inference inevitable. Therefore, this design might not work well with current state-of-the-art acceleration methods, which reduce the number of denoising steps to 4 steps or even less. (2) There is a small gap between the theoretical analysis and the empirical method. For the theoretical analysis, the author study the scenarios of linearly interpolation of scores produced by different
1. The observation in alternating prompts in diffusion-based models are important. 2. Both global and region-based generation are proposed. 3. Detailed visualization are provided.
1. The current design for the scheduling of the selection of frequent and rare composition of concepts is a bit ad-hoc. You always use frequent composition at the begining and then start randomly selection of composition after a fixed point. Based on your theoretical analysis, any additional guidance can be included or used to determine the selection of composition of concepts? 2. From your example, each rare composition has only two concept. How do you generalize your approach to more complicat
- This paper is well-written and easy to follow. - The method is training-free. Experimental results show that R2F outperforms previous models on various metrics. - It brings a new task to compositional generation or text-to-image generation.
- For applications, rare concept composition generation is still a relatively niche area, although I acknowledge that it is indeed a novel task within compositional generation. Have you considered exploring a broader range of application scenarios? - For the computational cost, this paper adopts an approach similar to LMD to enhance R2F, resulting in R2F+, which involves substantial latent and gradient computations. A detailed comparison of computational and memory overhead with other methods is
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNuclear reactor physics and engineering
MethodsDiffusion
