Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling
Cong Xu, Gayathri Saranathan, Mahammad Parwez Alam, Arpit Shah, James, Lim, Soon Yee Wong, Foltin Martin, Suparna Bhattacharya

TL;DR
SubLIME introduces an adaptive sampling framework that significantly reduces the computational cost of evaluating large language and text-to-image models while maintaining accurate model rankings and performance insights.
Contribution
The paper presents SubLIME, a novel adaptive sampling method that efficiently evaluates models across diverse benchmarks with minimal data, ensuring reliable rankings and reducing evaluation costs.
Findings
Quality-based sampling achieves high correlation (0.85-0.95) with full datasets at 10% sample rate.
Clustering methods perform well on specific benchmarks like MMLU.
A 1% sampling rate is effective for certain benchmarks like MMLU.
Abstract
Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as clustering and quality-based methods, to create representative subsets of benchmarks. Our approach ensures statistically aligned model rankings compared to full datasets, evidenced by high Pearson correlation coefficients. Empirical analysis across six NLP benchmarks reveals that: (1) quality-based sampling consistently achieves strong correlations (0.85 to 0.95) with full datasets at a 10\% sampling rate such as Quality SE and Quality CPD (2) clustering methods excel in specific benchmarks such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer
