What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length

Lindia Tjuatja; Graham Neubig; Tal Linzen; Sophie Hao

arXiv:2411.02528·cs.CL·June 4, 2025

What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length

Lindia Tjuatja, Graham Neubig, Tal Linzen, Sophie Hao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MORCELA, a data-driven method for adjusting language model scores to better match human acceptability judgments by accounting for length and unigram frequency effects, outperforming previous approaches.

Contribution

The paper proposes MORCELA, a novel linking theory with learned parameters for length and frequency adjustments, improving alignment between LM scores and human judgments.

Findings

01

MORCELA outperforms SLOR across transformer LMs.

02

Larger models require less adjustment for unigram frequency.

03

Larger LMs better predict rare words, reducing frequency effects.

Abstract

When comparing the linguistic capabilities of language models (LMs) with humans using LM probabilities, factors such as the length of the sequence and the unigram frequency of lexical items have a significant effect on LM probabilities in ways that humans are largely robust to. Prior works in comparing LM and human acceptability judgments treat these effects uniformly across models, making a strong assumption that models require the same degree of adjustment to control for length and unigram frequency effects. We propose MORCELA, a new linking theory between LM scores and acceptability judgments where the optimal level of adjustment for these effects is estimated from data via learned parameters for length and unigram frequency. We first show that MORCELA outperforms a commonly used linking theory for acceptability - SLOR (Pauls and Klein, 2012; Lau et al. 2017) - across two families of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lindiatjuatja/morcela
pytorchOfficial

Videos

What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length· underline

Taxonomy

TopicsExperimental Learning in Engineering