Submodular Benchmark Selection

Alexander Smola

arXiv:2605.02209·cs.AI·May 5, 2026

Submodular Benchmark Selection

Alexander Smola

PDF

TL;DR

This paper introduces a submodular optimization framework for selecting a small, representative subset of benchmarks to evaluate large language models efficiently, leveraging entropy and mutual information.

Contribution

It formalizes the benchmark selection problem as submodular maximization under a Gaussian model, proposing methods that outperform existing approaches in experiments.

Findings

01

Mutual information selection outperforms entropy for small subset imputation.

02

Entropy selection aligns with pivoted Cholesky and has spectral bounds.

03

Experiments on public leaderboards validate the proposed methods.

Abstract

Evaluating large language models across many benchmarks is expensive, yet many benchmarks are highly correlated. We formalize the selection of a small, informative subset as submodular maximization under a multivariate Gaussian model. Entropy (log-determinant covariance) and mutual information between selected and remaining benchmarks arise as natural objectives. Both are submodular; entropy selection coincides with pivoted Cholesky and has spectral residual bounds, while mutual information is non-monotone in general but empirically monotone for small subsets, so we optimize it greedily. Experiments on three matrices from ten public leaderboards show that mutual information selection outperforms entropy for imputation at small subsets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.