Calibration of Machine Reading Systems at Scale

Shehzaad Dhuliawala; Leonard Adolphs; Rajarshi Das; Mrinmaya Sachan

arXiv:2203.10623·cs.CL·May 24, 2022

Calibration of Machine Reading Systems at Scale

Shehzaad Dhuliawala, Leonard Adolphs, Rajarshi Das, Mrinmaya Sachan

PDF

TL;DR

This paper investigates the calibration of open-domain machine reading systems, highlighting challenges and proposing simple scalable methods to improve confidence estimates, which aids in handling unanswerable or out-of-distribution questions.

Contribution

It introduces scalable extensions to existing calibration techniques tailored for complex machine reading systems with retrieval and deep reading components.

Findings

01

Calibration techniques are challenging to scale to complex systems.

02

Proposed methods improve calibration in open-domain question answering.

03

Better confidence estimates help in identifying unanswerable questions.

Abstract

In typical machine learning systems, an estimate of the probability of the prediction is used to assess the system's confidence in the prediction. This confidence measure is usually uncalibrated; i.e.\ the system's confidence in the prediction does not match the true probability of the predicted output. In this paper, we present an investigation into calibrating open setting machine reading systems such as open-domain question answering and claim verification systems. We show that calibrating such complex systems which contain discrete retrieval and deep reading components is challenging and current calibration techniques fail to scale to these settings. We propose simple extensions to existing calibration approaches that allows us to adapt them to these settings. Our experimental results reveal that the approach works well, and can be useful to selectively predict answers when question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.