Token-Level LLM Collaboration via FusionRoute

Nuoya Xiong; Yuhang Zhou; Hanqing Zeng; Zhaorun Chen; Furong Huang; Shuchao Bi; Lizhu Zhang; Zhuokai Zhao

arXiv:2601.05106·cs.AI·May 22, 2026

Token-Level LLM Collaboration via FusionRoute

Nuoya Xiong, Yuhang Zhou, Hanqing Zeng, Zhaorun Chen, Furong Huang, Shuchao Bi, Lizhu Zhang, Zhuokai Zhao

PDF

TL;DR

FusionRoute is a token-level multi-LLM collaboration framework that improves performance by combining expert selection with a trainable complementary generator, addressing limitations of expert-only routing.

Contribution

It introduces FusionRoute, a novel token-level collaboration method that expands the effective policy class and enables near-optimal decoding through expert selection and logit addition.

Findings

01

FusionRoute outperforms existing collaboration and fine-tuning methods across multiple benchmarks.

02

Theoretical analysis shows expert-only routing has fundamental limitations without strong coverage assumptions.

03

Empirical results demonstrate FusionRoute's competitiveness with domain experts on various tasks.

Abstract

Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi-LLM collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each decoding step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning