MoLoRA: Composable Specialization via Per-Token Adapter Routing

Shrey Shah; Justin Wagle

arXiv:2603.15965·cs.CL·March 18, 2026

MoLoRA: Composable Specialization via Per-Token Adapter Routing

Shrey Shah, Justin Wagle

PDF

Open Access

TL;DR

MoLoRA introduces per-token adapter routing for multimodal and mixed-capability models, enabling modular, specialized, and efficient inference by dynamically selecting adapters per token, outperforming larger models in reasoning tasks.

Contribution

The paper proposes per-token routing and MoLoRA for composable specialization, allowing dynamic adapter selection and modular expertise without retraining.

Findings

01

MoLoRA outperforms larger models in reasoning benchmarks.

02

Specialization with MoLoRA exceeds scale benefits.

03

Modular adapters enable flexible, efficient inference.

Abstract

Multi-adapter serving systems route entire sequences to a single adapter, forcing a choice when requests span multiple domains. This assumption fails in two important settings: (1) multimodal generation, where text and image tokens require different adapters within the same sequence, and (2) mixed-capability requests like "write code to solve this equation," which need expertise from multiple specialized adapters. We introduce per-token routing, which routes individual tokens to adapters based on either vocabulary structure (for multimodal models) or learned gating (for semantic specialization). Per-token routing is provably optimal, achieving work N for N tokens versus K \cdot N for per-sequence routing with K adapter types. Our key contribution is MoLoRA (Mixture of LoRA), which enables composable specialization: load multiple domain-specific adapters and let a learned router select…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling