Cached Multi-Lora Composition for Multi-Concept Image Generation
Xiandong Zou, Mingzhu Shen, Christos-Savvas Bouganis, Yiren Zhao

TL;DR
This paper introduces CMLoRA, a training-free framework that uses frequency domain strategies to optimally compose multiple LoRAs for improved multi-concept image generation quality and efficiency.
Contribution
It proposes a novel frequency domain based sequencing strategy and a caching framework for multi-LoRA composition, addressing semantic conflicts and enhancing performance.
Findings
CMLoRA outperforms state-of-the-art methods in CLIPScore by 2.19%.
CMLoRA achieves an 11.25% higher MLLM win rate.
The frequency-based sequencing reduces semantic conflicts in LoRA composition.
Abstract
Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models, enabling precise rendering of multiple distinct elements, such as characters and styles, in multi-concept image generation. However, current approaches face significant challenges when composing these LoRAs for multi-concept image generation, resulting in diminished generated image quality. In this paper, we initially investigate the role of LoRAs in the denoising process through the lens of the Fourier frequency domain. Based on the hypothesis that applying multiple LoRAs could lead to "semantic conflicts", we find that certain LoRAs amplify high-frequency features such as edges and textures, whereas others mainly focus on low-frequency elements, including the overall structure and smooth color gradients. Building on these insights, we devise a frequency domain based sequencing strategy to…
Peer Reviews
Decision·ICLR 2025 Poster
The paper is generally well written and, (at least for the class of similar papers) rather easy to follow. The claims of the authors, on which the paper writing discourse is based on, are verified through evaluations which can become clear, if correctly exemplified.
Even if the writing is good, the quality of the visuals (e.g Fig 4, 6) can be improved. A lack of visual comparisons is not expected, given the fact that the most of the evaluations showing a certain advantage of the proposed method are either purely subjective or extremely difficult to quantify. At least in terms of quantitative evaluations (in terms of CLIPScore), the introduction of the cache mechanism does not show consistent results, but rather mixed. A systemic improvement/degradation of
1.Frequence domain analysis for multi-component generation is indeed an interesting idea. 2.The proposed solution is easy and clear (although high-level insight is not very obvious.) 3.The experiments are good in explain the effectiveness of the solution.
1.It’s not clear why frequency domain is needed to solve the multi-component generation task. A clear investigation and analysis on how they come up with this solution can further strengthen the contribution of the work. Particularly, more analysis is needed to explain why shift attention from spatial domain to frequency domain. 2.The observation that some LoRAs amplify high-frequency features, and others focus on low- frequency elements is based on a naïve experiment. More analysis or theoret
1. The paper introduces a novel Fourier-based approach to address the challenge of multi-LoRA composition by partitioning LoRA modules into high- and low-frequency categories. This frequency-aware sequencing strategy is innovative, as it moves beyond the typical naive integration of LoRAs by leveraging the frequency domain to systematically order their application during inference. This approach effectively mitigates semantic conflicts and represents a creative combination of LoRA adaptation wi
1. What are the failure cases? A couple of visual examples of failed outputs could provide more insights into the limitations of the CMLoRA method. 2. How were the caching hyperparameters $c_1$ and $c_2$ chosen, and how sensitive is the model’s performance to their variations? Furthermore, there is limited discussion of how the caching interval impacts the final performance in terms of both computational efficiency and image quality. Additional experiments that explore the impact of varying th
Code & Models
Videos
Taxonomy
TopicsWater Quality Monitoring Technologies · Robotics and Automated Systems · Advanced Image and Video Retrieval Techniques
MethodsFocus
