FamiCom: Further Demystifying Prompts for Language Models with   Task-Agnostic Performance Estimation

Bangzheng Li; Ben Zhou; Xingyu Fu; Fei Wang; Dan Roth; Muhao Chen

arXiv:2406.11243·cs.CL·June 18, 2024

FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation

Bangzheng Li, Ben Zhou, Xingyu Fu, Fei Wang, Dan Roth, Muhao Chen

PDF

Open Access

TL;DR

FamiCom is a new metric combining familiarity and complexity to better estimate language model performance across tasks and domains, outperforming existing metrics.

Contribution

This work introduces FamiCom, a comprehensive, task-agnostic performance estimation metric that improves over familiarity-only measures by incorporating task complexity.

Findings

01

FamiCom achieves a 0.85 correlation with actual performance.

02

FamiCom outperforms existing metrics in task transfer scenarios.

03

Using FamiCom improves prompt and demonstration selection accuracy by over 7%.

Abstract

Language models have shown impressive in-context-learning capabilities, which allow them to benefit from input prompts and perform better on downstream end tasks. Existing works investigate the mechanisms behind this observation, and propose label-agnostic prompt metrics that can better estimate end-task performances. One popular approach is using perplexity as a way to measure models' familiarity with the prompt. While showing consistent improvements on in-domain tasks, we found that familiarity metrics such as perplexity cannot accurately estimate performance in complicated situations such as task or domain transferring scenarios. In this work, we propose a revised measure called FamiCom, providing a more comprehensive measure for task-agnostic performance estimation. Specifically, FamiCom combines familiarity with \textit{complexity} -- the inherent difficulty of end tasks, which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare