How Does Beam Search improve Span-Level Confidence Estimation in   Generative Sequence Labeling?

Kazuma Hashimoto; Iftekhar Naim; Karthik Raman

arXiv:2212.10767·cs.CL·February 1, 2024

How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?

Kazuma Hashimoto, Iftekhar Naim, Karthik Raman

PDF

Open Access

TL;DR

This paper investigates how beam search can enhance the reliability of confidence estimates in generative sequence labeling, revealing that top-k prediction statistics improve calibration over simple decoder probabilities.

Contribution

It introduces a novel approach using beam search statistics to better calibrate confidence estimates in generative sequence labeling models, addressing a key gap in understanding model confidence.

Findings

01

Beam search statistics improve confidence calibration.

02

Decoder output probabilities are not optimal for confidence estimation.

03

Proposed method reduces calibration errors across multiple datasets.

Abstract

Sequence labeling is a core task in text understanding for IE/IR systems. Text generation models have increasingly become the go-to solution for such tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities \textbf{is not} the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Scientific Computing and Data Management