BenTo: Benchmark Task Reduction with In-Context Transferability

Hongyu Zhao; Ming Li; Lichao Sun; Tianyi Zhou

arXiv:2410.13804·cs.CL·October 23, 2024

BenTo: Benchmark Task Reduction with In-Context Transferability

Hongyu Zhao, Ming Li, Lichao Sun, Tianyi Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces BenTo, a method to efficiently reduce large language model benchmarks by selecting a representative subset of tasks based on transferability, significantly lowering evaluation costs with minimal accuracy loss.

Contribution

BenTo proposes a transferability-based task selection method that is training-free, gradient-free, and reduces benchmark size to 5% with less than 4% evaluation discrepancy.

Findings

01

Reduces benchmark tasks to 5% of original size

02

Maintains less than 4% difference in evaluation results

03

Method is training-free and highly efficient

Abstract

Evaluating large language models (LLMs) is costly: it requires the generation and examination of LLM outputs on a large-scale benchmark of various tasks. This paper investigates how to efficiently reduce the tasks used to benchmark LLMs without affecting the evaluation quality. Our study reveals that task transferability and relevance provide critical information to identify the most representative subset of tasks via optimizing a facility location function. We propose a practically efficient metric for estimating the transferability between two tasks via in-context learning (ICL). By analyzing the pairwise transferability, we can reduce tasks in a modern LLM benchmark (e.g., MMLU or FLAN) to 5% while inducing only a <4% difference to the evaluation on the original benchmark. Compared to prior works, our method is training-free, gradient-free, and highly efficient requiring ICL only.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tianyi-lab/bento
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems