FreeFuse: Multi-Subject LoRA Fusion via Adaptive Token-Level Routing at Test Time
Yaoli Liu, Yao-Xiang Ding, Kun Zhou

TL;DR
FreeFuse is a training-free, inference-only framework for multi-subject text-to-image generation that dynamically fuses multiple subject LoRAs using adaptive token-level routing, ensuring high fidelity without additional training or masks.
Contribution
It introduces FreeFuse, a novel inference-time method that effectively fuses multiple subject LoRAs without retraining or external segmentation, leveraging semantic alignment for dynamic token-region matching.
Findings
Outperforms existing methods in identity preservation.
Achieves high compositional fidelity in multi-subject generation.
Requires no additional training or external masks.
Abstract
This paper proposes FreeFuse, a training-free framework for multi-subject text-to-image generation through automatic fusion of multiple subject LoRAs. In contrast to prior studies that focus on retraining LoRA to alleviate feature conflicts, our analysis reveals that simply spatially confining the subject LoRA's output to its target region and preventing other LoRAs from directly intruding into this area is sufficient for effective mitigation. Accordingly, we implement Adaptive Token-Level Routing during the inference phase. We introduce FreeFuseAttn, a mechanism that exploits the flow matching model's intrinsic semantic alignment to dynamically match subject-specific tokens to their corresponding spatial regions at early denoising timesteps, thereby bypassing the need for external segmentors. FreeFuse distinguishes itself through high practicality: it necessitates no additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
