How to Compute the Probability of a Word

Tiago Pimentel; Clara Meister

arXiv:2406.14561·cs.CL·October 15, 2024

How to Compute the Probability of a Word

Tiago Pimentel, Clara Meister

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper clarifies the correct methods for computing word probabilities from subword language models, revealing widespread errors in prior research and demonstrating the impact of these corrections on linguistic analysis outcomes.

Contribution

It derives the proper techniques for probability calculation over words from subword models and highlights issues with common tokenization methods like bow-marking.

Findings

01

Incorrect probability computations are common in recent studies.

02

Correcting these errors significantly alters linguistic analysis results.

03

Impacts include changes in sentence comprehension and lexical optimization outcomes.

Abstract

Language models (LMs) estimate a probability distribution over strings in a natural language; these distributions are crucial for computing perplexity and surprisal in linguistics research. While we are usually concerned with measuring these values for words, most LMs operate over subwords. Despite seemingly straightforward, accurately computing probabilities over one unit given probabilities over the other requires care. Indeed, we show here that many recent linguistic studies have been incorrectly computing these values. This paper derives the correct methods for computing word probabilities, highlighting issues when relying on language models that use beginning-of-word (bow)-marking tokenisers, e.g., the GPT family. Empirically, we show that correcting the widespread bug in probability computations affects measured outcomes in sentence comprehension and lexical optimisation analyses.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

How to Compute the Probability of a Word· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Byte Pair Encoding · Attention Dropout · Dropout · Adam · Linear Warmup With Cosine Annealing · Linear Layer · Dense Connections