Dynamic Training-Free Fusion of Subject and Style LoRAs
Qinglong Cao, Yuntian Chen, Chao Ma, Xiaokang Yang

TL;DR
This paper introduces a dynamic, training-free fusion framework for combining subject and style LoRAs during image generation, utilizing feature divergence and metric-guided adjustments to improve coherence without retraining.
Contribution
The proposed method dynamically fuses LoRAs during generation using feature divergence and semantic metrics, outperforming static fusion approaches without retraining.
Findings
Outperforms state-of-the-art LoRA fusion methods quantitatively
Achieves coherent subject-style synthesis across diverse combinations
Operates without any retraining during the generation process
Abstract
Recent studies have explored the combination of multiple LoRAs to simultaneously generate user-specified subjects and styles. However, most existing approaches fuse LoRA weights using static statistical heuristics that deviate from LoRA's original purpose of learning adaptive feature adjustments and ignore the randomness of sampled inputs. To address this, we propose a dynamic training-free fusion framework that operates throughout the generation process. During the forward pass, at each LoRA-applied layer, we dynamically compute the KL divergence between the base model's original features and those produced by subject and style LoRAs, respectively, and adaptively select the most appropriate weights for fusion. In the reverse denoising stage, we further refine the generation trajectory by dynamically applying gradient-based corrections derived from objective metrics such as CLIP and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
