Defining and Evaluating Decision and Composite Risk in Language Models   Applied to Natural Language Inference

Ke Shen; Mayank Kejriwal

arXiv:2408.01935·cs.CL·August 6, 2024

Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference

Ke Shen, Mayank Kejriwal

PDF

Open Access

TL;DR

This paper introduces a framework to measure and evaluate the risks associated with confidence levels in large language models, focusing on decision abstention and inference accuracy in natural language reasoning tasks.

Contribution

It defines decision and composite risks, and proposes an experimental framework with metrics for assessing these risks in discriminative and generative LLMs.

Findings

01

Framework improves confidence in low-risk tasks by 20.1%

02

Framework skips 19.8% of high-risk tasks to avoid errors

03

Demonstrates utility on four commonsense reasoning datasets

Abstract

Despite their impressive performance, large language models (LLMs) such as ChatGPT are known to pose important risks. One such set of risks arises from misplaced confidence, whether over-confidence or under-confidence, that the models have in their inference. While the former is well studied, the latter is not, leading to an asymmetry in understanding the comprehensive risk of the model based on misplaced confidence. In this paper, we address this asymmetry by defining two types of risk (decision and composite risk), and proposing an experimental framework consisting of a two-level inference architecture and appropriate metrics for measuring such risks in both discriminative and generative LLMs. The first level relies on a decision rule that determines whether the underlying language model should abstain from inference. The second level (which applies if the model does not abstain) is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Softmax · Dense Connections · Dropout · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay