EffiEval: Efficient and Generalizable Model Evaluation via Capability Coverage Maximization

Yaoning Wang; Jiahao Ying; Yixin Cao; Yubo Ma; Yugang Jiang

arXiv:2508.09662·cs.CL·August 14, 2025

EffiEval: Efficient and Generalizable Model Evaluation via Capability Coverage Maximization

Yaoning Wang, Jiahao Ying, Yixin Cao, Yubo Ma, Yugang Jiang

PDF

TL;DR

EffiEval is a training-free, efficient benchmarking method for large language models that maintains high evaluation quality by adaptively selecting representative samples, reducing computational costs and ensuring fairness and generalizability.

Contribution

We introduce EffiEval, a novel approach that selects representative evaluation subsets based on Model Utility Index, enabling reliable, fair, and scalable model assessment without extensive data or performance bias.

Findings

01

Achieves high ranking consistency with full evaluation using only a small data subset.

02

Maintains fairness by independent sample selection from model performance.

03

Flexible and scalable, balancing efficiency and representativeness.

Abstract

The rapid advancement of large language models (LLMs) and the development of increasingly large and diverse evaluation benchmarks have introduced substantial computational challenges for model assessment. In this paper, we present EffiEval, a training-free approach for efficient benchmarking that effectively addresses data redundancy while maintaining high evaluation reliability. Our method is specifically designed to meet three key criteria for high-quality evaluation: representativeness, by ensuring comprehensive coverage of model capabilities; fairness, by remaining independent of model performance during sample selection to avoid bias; and generalizability, by enabling flexible transfer across datasets and model families without reliance on large-scale evaluation data. Unlike traditional methods that rely on absolute performance or require extensive evaluation data, our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.