Don't Always Pick the Highest-Performing Model: An Information Theoretic View of LLM Ensemble Selection

Yigit Turkmen; Baturalp Buyukates; Melih Bastopcu

arXiv:2602.08003·cs.LG·February 10, 2026

Don't Always Pick the Highest-Performing Model: An Information Theoretic View of LLM Ensemble Selection

Yigit Turkmen, Baturalp Buyukates, Melih Bastopcu

PDF

Open Access

TL;DR

This paper introduces an information-theoretic approach to select models for LLM ensembles, demonstrating that optimal selection based on mutual information can outperform traditional methods, especially when models are correlated.

Contribution

The paper formulates ensemble selection as maximizing mutual information and proposes a greedy algorithm that accounts for model correlation, improving ensemble performance under query budgets.

Findings

01

The proposed method outperforms baseline ensemble selection strategies.

02

Model correlation causes performance saturation, which can be mitigated by mutual information-based selection.

03

The approach is validated on multiple NLP datasets, showing consistent improvements.

Abstract

Large language models (LLMs) are often ensembled together to improve overall reliability and robustness, but in practice models are strongly correlated. This raises a fundamental question: which models should be selected when forming an LLM ensemble? We formulate budgeted ensemble selection as maximizing the mutual information between the true label and predictions of the selected models. Furthermore, to explain why performance can saturate even with many models, we model the correlated errors of the models using Gaussian-copula and show an information-theoretic error floor for the performance of the ensemble. Motivated by these, we propose a simple greedy mutual-information selection algorithm that estimates the required information terms directly from data and iteratively builds an ensemble under a query budget. We test our approach in two question answering datasets and one binary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Multimodal Machine Learning Applications