Modeling Language as a Sequence of Thoughts
Nasim Borazjanizadeh, James McClelland

TL;DR
The paper introduces the Thought Gestalt (TG) model, a recurrent transformer that models language at token and sentence levels, improving relational generalization and efficiency by mimicking human event-like representations.
Contribution
The TG model is a novel recurrent transformer architecture that incorporates sentence-level thought states with shared parameters, enhancing language modeling and generalization.
Findings
TG improves data and parameter efficiency over GPT-2.
TG reduces relational-direction errors in generalization tasks.
Scaling experiments show TG requires fewer resources to achieve similar performance.
Abstract
Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens, but by relying primarily on surface-level co-occurrence statistics they fail to form globally consistent latent representations of entities and events, which contributes to poor relational generalization (the reversal curse), contextualization errors, and data inefficiency. Cognitive science, by contrast, shows that human comprehension converts linguistic input into compact, event-like representations that persist in memory while verbatim form is short-lived. Motivated by these findings, we introduce the Thought Gestalt (TG) model, a recurrent transformer that models language at two levels of abstraction: tokens and sentence-level "thought" states. TG generates one sentence at a time while cross-attending to a working memory of prior sentence representations. Token and sentence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Language and cultural evolution
