Associative-State Universal Transformers: Sparse Retrieval Meets Structured Recurrence

Liu Xiao

arXiv:2604.25930·cs.CL·April 30, 2026

Associative-State Universal Transformers: Sparse Retrieval Meets Structured Recurrence

Liu Xiao

PDF

TL;DR

This paper introduces UniMatrix, a structured recurrent model for language modeling that enhances retrieval capabilities through sparse routing and pointer mechanisms, showing promising results in parameter efficiency and retrieval accuracy.

Contribution

The paper presents UniMatrix, a novel recurrent transformer variant with hybrid state updates and sparse routing, improving retrieval performance and parameter efficiency over standard transformers.

Findings

01

UniMatrix-Core and UniMatrix-ROSA slightly outperform Transformer on WikiText-2 with fewer parameters.

02

Original UniMatrix family performs poorly on associative recall, near chance levels.

03

UniMatrix-SparsePointer achieves up to 99.2% retrieval accuracy with fewer parameters.

Abstract

We study whether a structured recurrent state can serve as a compact associative backbone for language modeling while still supporting exact retrieval. We introduce UniMatrix, a Universal Transformer style family that reuses a shared recurrent block across depth and augments it with hybrid state updates, a ROSA-style residual path, and token-conditioned embedding modulation. We evaluate these models on byte-level WikiText-2, synthetic associative recall, throughput profiling on Apple MPS, and a corrected benchmark for triple-token interactions. At small scale, UniMatrix-Core and UniMatrix-ROSA slightly outperform a parameter-matched Transformer on WikiText-2 while using many fewer parameters, reaching 5.084 and 5.083 bits-per-byte versus 5.124. The main negative result is equally important: on associative recall, the original UniMatrix family remains near chance while the Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.