RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents
Jize Wang, Han Wu, Zhiyuan You, Yiming Song, Yijun Wang, Zifei Shan, Yining Li, Songyang Zhang, Xinyi Le, Cailian Chen, Xinping Guan, Dacheng Tao

TL;DR
RouteMoA introduces a dynamic routing framework for mixture-of-agents that significantly reduces inference costs and latency by using lightweight pre-selection and scoring, enabling efficient large-scale model utilization.
Contribution
It presents a novel dynamic routing method that avoids full inference for all models, improving efficiency and scalability in mixture-of-agents systems.
Findings
Reduces cost by 89.8% in large model pools
Decreases latency by 63.6%
Outperforms existing MoA methods across tasks
Abstract
Mixture-of-Agents (MoA) improves LLM performance through layered collaboration, but its dense topology raises costs and latency. Existing methods employ LLM judges to filter responses, yet still require all models to perform inference before judging, failing to cut costs effectively. They also lack model selection criteria and struggle with large model pools, where full inference is costly and can exceed context limits. To address this, we propose RouteMoA, an efficient mixture-of-agents framework with dynamic routing. It employs a lightweight scorer to perform initial screening by predicting coarse-grained performance from the query, narrowing candidates to a high-potential subset without inference. A mixture of judges then refines these scores through lightweight self- and cross-assessment based on existing model outputs, providing posterior correction without additional inference.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Data Stream Mining Techniques · Advanced Neural Network Applications
