LoRAtorio: An intrinsic approach to LoRA Skill Composition

Niki Foteinopoulou; Ignas Budvytis; Stephan Liwicki

arXiv:2508.11624·cs.CV·August 18, 2025

LoRAtorio: An intrinsic approach to LoRA Skill Composition

Niki Foteinopoulou, Ignas Budvytis, Stephan Liwicki

PDF

3 Reviews

TL;DR

LoRAtorio introduces a train-free, intrinsic method for composing multiple LoRA adapters in diffusion models, improving personalization and open-ended skill combination without retraining.

Contribution

The paper proposes a novel latent space, similarity-based composition framework for multiple LoRA adapters, addressing domain drift and enabling dynamic inference-time selection.

Findings

01

Achieves up to 1.3% improvement in ClipScore

02

72.43% win rate in GPT-4V evaluations

03

Effective generalization across multiple diffusion models

Abstract

Low-Rank Adaptation (LoRA) has become a widely adopted technique in text-to-image diffusion models, enabling the personalisation of visual concepts such as characters, styles, and objects. However, existing approaches struggle to effectively compose multiple LoRA adapters, particularly in open-ended settings where the number and nature of required skills are not known in advance. In this work, we present LoRAtorio, a novel train-free framework for multi-LoRA composition that leverages intrinsic model behaviour. Our method is motivated by two key observations: (1) LoRA adapters trained on narrow domains produce denoised outputs that diverge from the base model, and (2) when operating out-of-distribution, LoRA outputs show behaviour closer to the base model than when conditioned in distribution. The balance between these two observations allows for exceptional performance in the single…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. The paper is well structured and easy to follow. 2. The proposed approach demonstrates better results with increasing the active LoRA adapters 3. Re-centering of the unconditional noise could be used independently 4. Both UNet and DiT-based models are checked 5. The human and VLM-based evaluations are fully described 6. Extensive appendix

Weaknesses

1. MultiLoRA composition task with a dynamic LoRA selection probably requires more detailed description as now it lacks motivation (at least some potential use cases) 2.The majority of the comparisons are done using CLIPScore that is a good proxy metric; however, a more extensive human or VLM-based evaluation is suggested 3. Only composition of LoRas for the Character, Style and Background are considered. No compositions with LoRAs for faster inference (e.g., LCM) are checked 4. see questions

Reviewer 02Rating 4Confidence 4

Strengths

1. The authors propose spatially-aware similarity metric to use as a proxy for LoRA adapter's confidence, with sound theoretical motivation. 2. The authors extend the task of multi-LoRA composition to a dynamic module selection setting, which is a good, real-world skill composition scenario.

Weaknesses

1. The first contribution seems to be incremental - MultLFG (2nd best method) proposes "... training-free frequency-aware multi-LoRA merging. The key idea is to decompose LoRA-based noise predictions into frequency subbands and perform adaptive merging based on relevance scores." (https://arxiv.org/pdf/2505.20525), whereas this paper proposes patched cosine distance instead of frequency subbands. 2. The second contribution - re-centering - is, per your results in Table 6a, only better by 0.01 (3

Reviewer 03Rating 2Confidence 4

Strengths

The paper demonstrates originality by proposing a train-free, intrinsically guided framework for multi-LoRA composition, departing from the reliance on weight merging or learned gating. The quality of the work is evident in the methodology, including spatial patch-based weighting, re-centered guidance, and dynamic module selection. The paper is clearly written, with effective visualizations and thorough empirical support.

Weaknesses

While the paper presents an innovative and effective approach, there are several notable weaknesses that merit attention. First, the authors do not release their code, which hinders reproducibility and weakens the reliability of the claimed results. Second, the core mechanism—spatial patch-based weighting—raises concerns when dealing with heterogeneous LoRA types. For example, style-oriented LoRAs may introduce global stylistic shifts across all spatial regions, while object-specific LoRAs affec

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.