What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction

Charlie Wyatt; Aditya Joshi; Flora Salim

arXiv:2508.07702·cs.CL·August 12, 2025

What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction

Charlie Wyatt, Aditya Joshi, Flora Salim

PDF

Open Access

TL;DR

This paper evaluates large language models on their ability to predict missing sentences in various domains, revealing they perform poorly on this task despite excelling in other areas, indicating a gap in their global coherence skills.

Contribution

It introduces the Masked Sentence Prediction task and assesses commercial LLMs' performance across different domains, highlighting their limitations in predicting full sentences within context.

Findings

01

LLMs perform poorly on masked sentence prediction in low-structured domains.

02

Current models excel in local fluency but lack global coherence.

03

There is a significant gap in LLM capabilities for reconstructive tasks.

Abstract

Transformer-based models primarily rely on Next Token Prediction (NTP), which predicts the next token in a sequence based on the preceding context. However, NTP's focus on single-token prediction often limits a model's ability to plan ahead or maintain long-range coherence, raising questions about how well LLMs can predict longer contexts, such as full sentences within structured documents. While NTP encourages local fluency, it provides no explicit incentive to ensure global coherence across sentence boundaries-an essential skill for reconstructive or discursive tasks. To investigate this, we evaluate three commercial LLMs (GPT-4o, Claude 3.5 Sonnet, and Gemini 2.0 Flash) on Masked Sentence Prediction (MSP) - the task of infilling a randomly removed sentence - from three domains: ROCStories (narrative), Recipe1M (procedural), and Wikipedia (expository). We assess both fidelity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education