Posterior calibration and exploratory analysis for natural language processing models
Khanh Nguyen, Brendan O'Connor

TL;DR
This paper emphasizes the importance of evaluating the calibration of probabilistic NLP models and introduces methods for assessing and improving their uncertainty estimates, enhancing trustworthiness in NLP applications.
Contribution
It presents a novel approach to analyze model calibration in NLP and introduces a coreference sampling algorithm for confidence interval estimation in event extraction.
Findings
Many NLP models are miscalibrated, affecting trust in their predictions.
The proposed calibration analysis method effectively compares model uncertainties.
The coreference sampling algorithm provides reliable confidence intervals for event extraction.
Abstract
Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model' s posterior distribution can and should be directly evaluated, as to whether probabilities correspond to empirical frequencies, and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a user when to trust and not trust the NLP analysis. We present a method to analyze calibration, and apply it to compare the miscalibration of several commonly used models. We also contribute a coreference sampling algorithm that can create confidence intervals for a political event extraction task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
