How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor?

Christian Clark; Byung-Doh Oh; William Schuler

arXiv:2507.22209·cs.CL·July 31, 2025

How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor?

Christian Clark, Byung-Doh Oh, William Schuler

PDF

TL;DR

This paper investigates how well first-token entropy estimates approximate true word entropy in psycholinguistic models, revealing that the common approximation may lead to underestimation and distorted results, thus urging caution in its use.

Contribution

The study introduces Monte Carlo estimates of word entropy that account for variable token spans, highlighting limitations of first-token entropy approximations in psycholinguistic research.

Findings

01

First-token entropy often underestimates true word entropy.

02

Monte Carlo estimates provide more accurate entropy measurements.

03

Using first-token entropy can distort psycholinguistic predictions.

Abstract

Contextual entropy is a psycholinguistic measure capturing the anticipated difficulty of processing a word just before it is encountered. Recent studies have tested for entropy-related effects as a potential complement to well-known effects from surprisal. For convenience, entropy is typically estimated based on a language model's probability distribution over a word's first subword token. However, this approximation results in underestimation and potential distortion of true word entropy. To address this, we generate Monte Carlo (MC) estimates of word entropy that allow words to span a variable number of tokens. Regression experiments on reading times show divergent results between first-token and MC word entropy, suggesting a need for caution in using first-token approximations of contextual entropy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.