Making Pre-trained Language Models both Task-solvers and   Self-calibrators

Yangyi Chen; Xingyao Wang; Heng Ji

arXiv:2307.11316·cs.CL·July 24, 2023

Making Pre-trained Language Models both Task-solvers and Self-calibrators

Yangyi Chen, Xingyao Wang, Heng Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces LM-TOAST, a training algorithm that enables pre-trained language models to serve as both effective task-solvers and self-calibrators, especially under limited data and distribution shifts, improving confidence estimation without sacrificing task accuracy.

Contribution

The paper proposes LM-TOAST, a novel training method that enhances PLMs' ability to self-calibrate with limited data, addressing challenges like data imbalance and distribution shifts.

Findings

01

LM-TOAST improves confidence calibration in PLMs.

02

It maintains original task performance while enhancing calibration.

03

Effective in downstream applications like selective classification and adversarial defense.

Abstract

Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows that introducing an extra calibration task can mitigate this issue. The basic idea involves acquiring additional data to train models in predicting the confidence of their initial predictions. However, it only demonstrates the feasibility of this kind of method, assuming that there are abundant extra available samples for the introduced calibration task. In this work, we consider the practical scenario that we need to effectively utilize training samples to make PLMs both task-solvers and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangyi-chen/lm-toast
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning