Re-Examining Calibration: The Case of Question Answering

Chenglei Si; Chen Zhao; Sewon Min; Jordan Boyd-Graber

arXiv:2205.12507·cs.CL·October 25, 2022·1 cites

Re-Examining Calibration: The Case of Question Answering

Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new calibration metric, MacroCE, and a calibration method, ConsCal, for open-domain question answering, highlighting limitations of traditional calibration evaluation and demonstrating the need for better calibration techniques.

Contribution

The paper proposes MacroCE as a more effective calibration metric and introduces ConsCal, a novel calibration method leveraging prediction consistency across checkpoints.

Findings

01

Traditional calibration methods do not improve MacroCE scores significantly.

02

MacroCE better captures the quality of confidence estimates in QA models.

03

ConsCal outperforms existing calibration techniques under the new metric.

Abstract

For users to trust model predictions, they need to understand model outputs, particularly their confidence - calibration aims to adjust (calibrate) models' confidence to match expected accuracy. We argue that the traditional calibration evaluation does not promote effective calibrations: for example, it can encourage always assigning a mediocre confidence score to all predictions, which does not help users distinguish correct predictions from wrong ones. Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions. Focusing on the practical application of open-domain question answering, we examine conventional calibration methods applied on the widely-used retriever-reader pipeline, all of which do not bring significant gains under our new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

noviscl/calibrateqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Access Control and Trust · Advanced Graph Neural Networks