Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou

TL;DR
This paper introduces a Mixture-of-Agents (MoA) architecture that combines multiple large language models in layered structures, significantly improving performance on various benchmarks over existing models like GPT-4 Omni.
Contribution
The paper proposes a novel layered MoA architecture that leverages multiple LLMs collectively, achieving state-of-the-art results with open-source models.
Findings
MoA surpasses GPT-4 Omni on key benchmarks.
Open-source LLMs with MoA outperform proprietary models.
MoA achieves 65.1% on AlpacaEval 2.0, outperforming previous methods.
Abstract
Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) methodology. In our approach, we construct a layered MoA architecture wherein each layer comprises multiple LLM agents. Each agent takes all the outputs from agents in the previous layer as auxiliary information in generating its response. MoA models achieves state-of-art performance on AlpacaEval 2.0, MT-Bench and FLASK, surpassing GPT-4 Omni. For example, our MoA using only open-source LLMs is the leader of AlpacaEval 2.0 by a substantial gap, achieving a score of 65.1% compared to 57.5% by…
Peer Reviews
Decision·ICLR 2025 Spotlight
1. To the best of my knowledge, the proposed framework is both novel and reasonable. MoA can be viewed as a specific method for combining multiple weaker models to create a stronger model. 2. The model's performance is competitive, showing improvements over GPT-4 Omni on three benchmarks.
1. The proposed model is more resource-intensive than single LLM-based models. 2. Most evaluations use LC metrics, with only a limited evaluation on MATH tasks included in the appendix. Further evaluations on diverse tasks are necessary to illustrate the general advantages of the proposed method. 3. An important question is how to select the set of proposal LLMs. Currently, the paper demonstrates two setups: one with relatively large models and one with smaller models. However, there is n
+ Proposal of a new effective framework to employ the collective intelligence of multiple LLMs. + Empirical evaluation on AlpacaEval 2.0, Arena-Hard, and MT-Bench verifies the effectiveness of the proposed solution.
+ Stacking LLMs into layers and revising the output obtained from previous layers seems like another form of model ensemble and I would suggest including model ensemble as one of the comparative methods. + In Figure 6, the max number of tflops among proposers in each MoA layer is used as an approximation of the total tflops of the entire layers since different proposers can run in a parallel way. However, the approximation is only reasonable when considering the inference latency for a single qu
- While collaborativeness has been harnessed in various ways, a layered funnel architecture in which earlier layers add information for later layers to consume and interplay to summarize these outputs efficiently to yield a final output has not been explored. - The authors also thoroughly conducted their experiments to establish collaborativeness and benchmark various datasets. The usage of open-source models to showcase the results helps to make replicating these results possible. - They al
- The idea of collaborativeness or hierarchical processing in LLMs is not exactly novel [1][2]; if you think of different layers in the architecture using the same model, this reduces to some form of iterative refinement of outputs as shown in [2]. - Some of the analysis in the paper to support budget analysis is unclear. #### *References* 1] [2308.10848] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors 2] LEGO: A Multi-agent Collaborative Framework with
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer
