Modeling Language as a Sequence of Thoughts

Nasim Borazjanizadeh; James McClelland

arXiv:2512.25026·cs.CL·January 14, 2026

Modeling Language as a Sequence of Thoughts

Nasim Borazjanizadeh, James McClelland

PDF

Open Access

TL;DR

The paper introduces the Thought Gestalt (TG) model, a recurrent transformer that models language at token and sentence levels, improving relational generalization and efficiency by mimicking human event-like representations.

Contribution

The TG model is a novel recurrent transformer architecture that incorporates sentence-level thought states with shared parameters, enhancing language modeling and generalization.

Findings

01

TG improves data and parameter efficiency over GPT-2.

02

TG reduces relational-direction errors in generalization tasks.

03

Scaling experiments show TG requires fewer resources to achieve similar performance.

Abstract

Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens, but by relying primarily on surface-level co-occurrence statistics they fail to form globally consistent latent representations of entities and events, which contributes to poor relational generalization (the reversal curse), contextualization errors, and data inefficiency. Cognitive science, by contrast, shows that human comprehension converts linguistic input into compact, event-like representations that persist in memory while verbatim form is short-lived. Motivated by these findings, we introduce the Thought Gestalt (TG) model, a recurrent transformer that models language at two levels of abstraction: tokens and sentence-level "thought" states. TG generates one sentence at a time while cross-attending to a working memory of prior sentence representations. Token and sentence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Language and cultural evolution