HRM-Text: Efficient Pretraining Beyond Scaling

Guan Wang; Changling Liu; Chenyu Wang; Cai Zhou; Yuhao Sun; Yifei Wu; Shuai Zhen; Luca Scimeca; Yasin Abbasi Yadkori

arXiv:2605.20613·cs.CL·May 21, 2026

HRM-Text: Efficient Pretraining Beyond Scaling

Guan Wang, Changling Liu, Chenyu Wang, Cai Zhou, Yuhao Sun, Yifei Wu, Shuai Zhen, Luca Scimeca, Yasin Abbasi Yadkori

PDF

1 Repo 3 Models 2 Datasets

TL;DR

HRM-Text introduces a biologically inspired hierarchical recurrent model for language pretraining, achieving competitive performance with significantly less data and compute, thus lowering barriers for foundational research.

Contribution

The paper presents HRM-Text, a novel hierarchical recurrent architecture with new stabilization techniques, enabling efficient pretraining on limited data and compute.

Findings

01

Achieves 60.7% on MMLU with only 40 billion tokens

02

Uses 100-900x less tokens and 96-432x less compute than standard models

03

Performs competitively with larger open models (2-7B parameters)

Abstract

The current pretraining paradigm for large language models relies on massive compute and internet-scale raw text, creating a significant barrier to foundational research. In contrast, biological systems demonstrate highly sample-efficient learning through multi-timescale processing, such as the functional organization of the frontoparietal loop. Taking this as inspiration, we introduce HRM-Text, which replaces standard Transformers with a Hierarchical Recurrent Model (HRM) that decouples computation into slow-evolving strategic and fast-evolving execution layers. To stabilize this deep recurrence for language modeling, we introduce MagicNorm and warmup deep credit assignment. Furthermore, instead of standard raw-text pretraining, we train exclusively on instruction-response pairs using a task-completion objective and PrefixLM masking. Serving as an empirical existence proof of efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sapientinc/HRM-Text
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.