Log Probabilities Are a Reliable Estimate of Semantic Plausibility in   Base and Instruction-Tuned Language Models

Carina Kauf; Emmanuele Chersoni; Alessandro Lenci; Evelina Fedorenko; and Anna A. Ivanova

arXiv:2403.14859·cs.CL·October 22, 2024·3 cites

Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models

Carina Kauf, Emmanuele Chersoni, Alessandro Lenci, Evelina Fedorenko, and Anna A. Ivanova

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that log probabilities from language models reliably estimate semantic plausibility better than zero-shot prompting, with instruction tuning having minimal impact on this capability.

Contribution

It provides empirical evidence that LogProbs are a more consistent measure of semantic plausibility than prompting in both base and instruction-tuned language models.

Findings

01

LogProbs outperform prompting in measuring plausibility.

02

Instruction tuning does not significantly change LogProbs sensitivity.

03

Context modulates LogProbs in expected ways, aligning with human judgments.

Abstract

Semantic plausibility (e.g. knowing that "the actor won the award" is more likely than "the actor won the battle") serves as an effective proxy for general world knowledge. Language models (LMs) capture vast amounts of world knowledge by learning distributional patterns in text, accessible via log probabilities (LogProbs) they assign to plausible vs. implausible outputs. The new generation of instruction-tuned LMs can now also provide explicit estimates of plausibility via prompting. Here, we evaluate the effectiveness of LogProbs and basic prompting to measure semantic plausibility, both in single-sentence minimal pairs (Experiment 1) and short context-dependent scenarios (Experiment 2). We find that (i) in both base and instruction-tuned LMs, LogProbs offers a more reliable measure of semantic plausibility than direct zero-shot prompting, which yields inconsistent and often poor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

carina-kauf/llm-plaus-prob
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsBalanced Selection