Query-efficient model evaluation using cached responses
Hayden Helm, Ben Johnson, Carey Priebe

TL;DR
This paper presents a query-efficient method for evaluating models using cached responses by leveraging the Data Kernel Perspective Space (DKPS), reducing the number of queries needed for accurate benchmark performance prediction.
Contribution
It introduces DKPS-based techniques for model evaluation that are theoretically query-efficient and empirically achieve comparable accuracy with fewer queries.
Findings
DKPS-based methods match baseline accuracy with fewer queries
Theoretical proof of query efficiency under certain conditions
Offline query selection improves prediction accuracy
Abstract
Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from previously-evaluated models are often cached -- creating a potential opportunity to use this additional information to decrease the number of queries required to accurately evaluate a new model. In this paper, we introduce an approach for predicting benchmark performance that leverages cached model responses based on the Data Kernel Perspective Space (DKPS), a method for quantifying the relationship between models in the black-box setting. Theoretically, we show that DKPS-based methods are query-efficient under certain conditions. Empirically, we demonstrate that DKPS-based methods achieve the same mean absolute error as baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
