Probing neural language models for understanding of words of estimative   probability

Damien Sileo; Marie-Francine Moens

arXiv:2211.03358·cs.CL·June 27, 2023

Probing neural language models for understanding of words of estimative probability

Damien Sileo, Marie-Francine Moens

PDF

Open Access 1 Datasets

TL;DR

This paper investigates whether neural language models understand words of estimative probability (WEP) and their associated probability levels, revealing current models' limitations and potential improvements through fine-tuning.

Contribution

It introduces datasets and tasks to evaluate neural models' grasp of WEP probabilities and probabilistic reasoning, highlighting their current shortcomings and the benefits of fine-tuning.

Findings

01

Off-the-shelf models fail to accurately predict WEP probabilities.

02

Fine-tuning improves models' ability to understand and reason with WEP.

03

Models still struggle with complex probabilistic reasoning tasks.

Abstract

Words of estimative probability (WEP) are expressions of a statement's plausibility (probably, maybe, likely, doubt, likely, unlikely, impossible...). Multiple surveys demonstrate the agreement of human evaluators when assigning numerical probability levels to WEP. For example, highly likely corresponds to a median chance of 0.90+-0.08 in Fagen-Ulmschneider (2015)'s survey. In this work, we measure the ability of neural language processing models to capture the consensual probability level associated to each WEP. Firstly, we use the UNLI dataset (Chen et al., 2020) which associates premises and hypotheses with their perceived joint probability p, to construct prompts, e.g. "[PREMISE]. [WEP], [HYPOTHESIS]." and assess whether language models can predict whether the WEP consensual probability level is close to p. Secondly, we construct a dataset of WEP-based probabilistic reasoning, to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sileod/probability_words_nli
dataset· 179 dl
179 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsTest