MoLoRA: Composable Specialization via Per-Token Adapter Routing
Shrey Shah, Justin Wagle

TL;DR
MoLoRA introduces per-token adapter routing for multimodal and mixed-capability models, enabling modular, specialized, and efficient inference by dynamically selecting adapters per token, outperforming larger models in reasoning tasks.
Contribution
The paper proposes per-token routing and MoLoRA for composable specialization, allowing dynamic adapter selection and modular expertise without retraining.
Findings
MoLoRA outperforms larger models in reasoning benchmarks.
Specialization with MoLoRA exceeds scale benefits.
Modular adapters enable flexible, efficient inference.
Abstract
Multi-adapter serving systems route entire sequences to a single adapter, forcing a choice when requests span multiple domains. This assumption fails in two important settings: (1) multimodal generation, where text and image tokens require different adapters within the same sequence, and (2) mixed-capability requests like "write code to solve this equation," which need expertise from multiple specialized adapters. We introduce per-token routing, which routes individual tokens to adapters based on either vocabulary structure (for multimodal models) or learned gating (for semantic specialization). Per-token routing is provably optimal, achieving work N for N tokens versus K \cdot N for per-sequence routing with K adapter types. Our key contribution is MoLoRA (Mixture of LoRA), which enables composable specialization: load multiple domain-specific adapters and let a learned router select…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling
