Context Is Not Comprehension

Alex Pan; Mary-Anne Williams

arXiv:2506.04907·cs.CL·June 13, 2025

Context Is Not Comprehension

Alex Pan, Mary-Anne Williams

PDF

Open Access

TL;DR

The paper introduces Verbose ListOps (VLO), a benchmark that evaluates multi-step reasoning in language models by embedding deterministic computations in narratives, revealing models' true comprehension beyond mere recall.

Contribution

VLO provides a novel, step-level evaluation framework for reasoning in language models, moving beyond context length limitations and enabling diverse reasoning schemas in narrative form.

Findings

01

Models solving raw ListOps fail on VLO after 10,000 tokens

02

VLO exposes reasoning chain divergence points

03

VLO's pipeline supports various reasoning schemas

Abstract

The dominant way of judging Large Language Models (LLMs) has been to ask how well they can recall explicit facts from very long inputs. While today's best models achieve near perfect recall, this masks a harder skill: performing multi-step reasoning and tracking intermediate state that never appears verbatim. We introduce Verbose ListOps (VLO), a benchmark that embeds deterministic ListOps computations inside narrative camouflage and, crucially, allows step-level evaluation of every intermediate result. Experiments show that models which solve raw ListOps with approximately 100% accuracy collapse on VLO after only 10,000 tokens. By exposing where a model's reasoning chain first diverges, VLO moves assessment beyond sheer context length and toward genuine comprehension. VLO's generation pipeline is task-agnostic: it can weave any deterministically verifiable reasoning schema --…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)