Plug-and-Play Performance Estimation for LLM Services without Relying on   Labeled Data

Can Wang; Dianbo Sui; Hongliang Sun; Hao Ding; Bolin Zhang; Zhiying Tu

arXiv:2410.07737·cs.PF·October 11, 2024

Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data

Can Wang, Dianbo Sui, Hongliang Sun, Hao Ding, Bolin Zhang, Zhiying Tu

PDF

Open Access 1 Repo

TL;DR

This paper presents a plug-and-play method to estimate LLM service performance across tasks using only unlabeled samples, leveraging negative log-likelihood and perplexity as key features, without relying on labeled data.

Contribution

It introduces a novel performance estimation approach for LLM services that requires no labeled data and can be applied directly during service invocation.

Findings

01

Effective performance estimation using NLL and perplexity features

02

Comparison shows superiority over baseline methods

03

Demonstrated applicability in service selection and optimization

Abstract

Large Language Model (LLM) services exhibit impressive capability on unlearned tasks leveraging only a few examples by in-context learning (ICL). However, the success of ICL varies depending on the task and context, leading to heterogeneous service quality. Directly estimating the performance of LLM services at each invocation can be laborious, especially requiring abundant labeled data or internal information within the LLM. This paper introduces a novel method to estimate the performance of LLM services across different tasks and contexts, which can be "plug-and-play" utilizing only a few unlabeled samples like ICL. Our findings suggest that the negative log-likelihood and perplexity derived from LLM service invocation can function as effective and significant features. Based on these features, we utilize four distinct meta-models to estimate the performance of LLM services. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WangCan1178/Plug-and-Play-Estimation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management · Advanced Queuing Theory Analysis

Methodstravel james