The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants

Yiqun Zhang; Hao Li; Chenxu Wang; Linyao Chen; Qiaosheng Zhang; Peng Ye; Shi Feng; Daling Wang; Zhen Wang; Xinrun Wang; Jia Xu; Lei Bai; Wanli Ouyang; Shuyue Hu

arXiv:2505.19797·cs.CL·June 19, 2025

The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants

Yiqun Zhang, Hao Li, Chenxu Wang, Linyao Chen, Qiaosheng Zhang, Peng Ye, Shi Feng, Daling Wang, Zhen Wang, Xinrun Wang, Jia Xu, Lei Bai, Wanli Ouyang, Shuyue Hu

PDF

Open Access 1 Repo

TL;DR

The paper introduces the Avengers, a simple ensemble method that combines smaller open-source language models through clustering, scoring, and voting, outperforming larger proprietary models like GPT-4 on diverse tasks.

Contribution

It presents a novel ensemble approach that leverages collective intelligence of smaller models, achieving competitive performance against proprietary giants across multiple benchmarks.

Findings

01

Outperforms GPT-4 variants on 15 datasets

02

Surpasses GPT-4.1 in mathematics and coding tasks

03

Demonstrates strong out-of-distribution generalization

Abstract

Proprietary giants are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers -- a simple recipe that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model; (ii) clustering: group queries based on their semantic similarity; (iii) scoring: scores each model's performance within each cluster; and (iv) voting: improve outputs via repeated sampling and voting. At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling. Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangyiqun018/avengers
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Layer Normalization · Byte Pair Encoding