Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?
Wenzhe Li, Yong Lin, Mengzhou Xia, Chi Jin

TL;DR
This paper questions the benefit of mixing different large language models in ensemble methods and proposes Self-MoA, which outperforms traditional mixtures by aggregating only the top-performing model, leading to state-of-the-art results.
Contribution
The paper introduces Self-MoA, a novel ensemble method that aggregates only the best LLM, demonstrating superior performance over traditional mixture-of-agents approaches.
Findings
Self-MoA outperforms standard MoA by 6.6% on AlpacaEval 2.0.
Self-MoA achieves an average of 3.8% improvement across various benchmarks.
Mixing different LLMs can lower average output quality, making top-only aggregation more effective.
Abstract
Ensembling outputs from diverse sources is a straightforward yet effective approach to boost performance. Mixture-of-Agents (MoA) is one such popular ensemble method that aggregates outputs from multiple different Large Language Models (LLMs). This paper raises the question in the context of language models: is mixing different LLMs truly beneficial? We propose Self-MoA -- an ensemble method that aggregates outputs from only the single top-performing LLM. Our extensive experiments reveal that, surprisingly, Self-MoA outperforms standard MoA that mixes different LLMs in a large number of scenarios: Self-MoA achieves improvement over MoA on the AlpacaEval 2.0 benchmark, and an average of improvement across various benchmarks, including MMLU, CRUX, and MATH. Applying Self-MoA to one of the top-ranking models in AlpacaEval 2.0 directly achieves the new state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
