EffiEval: Efficient and Generalizable Model Evaluation via Capability Coverage Maximization
Yaoning Wang, Jiahao Ying, Yixin Cao, Yubo Ma, Yugang Jiang

TL;DR
EffiEval is a training-free, efficient benchmarking method for large language models that maintains high evaluation quality by adaptively selecting representative samples, reducing computational costs and ensuring fairness and generalizability.
Contribution
We introduce EffiEval, a novel approach that selects representative evaluation subsets based on Model Utility Index, enabling reliable, fair, and scalable model assessment without extensive data or performance bias.
Findings
Achieves high ranking consistency with full evaluation using only a small data subset.
Maintains fairness by independent sample selection from model performance.
Flexible and scalable, balancing efficiency and representativeness.
Abstract
The rapid advancement of large language models (LLMs) and the development of increasingly large and diverse evaluation benchmarks have introduced substantial computational challenges for model assessment. In this paper, we present EffiEval, a training-free approach for efficient benchmarking that effectively addresses data redundancy while maintaining high evaluation reliability. Our method is specifically designed to meet three key criteria for high-quality evaluation: representativeness, by ensuring comprehensive coverage of model capabilities; fairness, by remaining independent of model performance during sample selection to avoid bias; and generalizability, by enabling flexible transfer across datasets and model families without reliance on large-scale evaluation data. Unlike traditional methods that rely on absolute performance or require extensive evaluation data, our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
