Associative-State Universal Transformers: Sparse Retrieval Meets Structured Recurrence
Liu Xiao

TL;DR
This paper introduces UniMatrix, a structured recurrent model for language modeling that enhances retrieval capabilities through sparse routing and pointer mechanisms, showing promising results in parameter efficiency and retrieval accuracy.
Contribution
The paper presents UniMatrix, a novel recurrent transformer variant with hybrid state updates and sparse routing, improving retrieval performance and parameter efficiency over standard transformers.
Findings
UniMatrix-Core and UniMatrix-ROSA slightly outperform Transformer on WikiText-2 with fewer parameters.
Original UniMatrix family performs poorly on associative recall, near chance levels.
UniMatrix-SparsePointer achieves up to 99.2% retrieval accuracy with fewer parameters.
Abstract
We study whether a structured recurrent state can serve as a compact associative backbone for language modeling while still supporting exact retrieval. We introduce UniMatrix, a Universal Transformer style family that reuses a shared recurrent block across depth and augments it with hybrid state updates, a ROSA-style residual path, and token-conditioned embedding modulation. We evaluate these models on byte-level WikiText-2, synthetic associative recall, throughput profiling on Apple MPS, and a corrected benchmark for triple-token interactions. At small scale, UniMatrix-Core and UniMatrix-ROSA slightly outperform a parameter-matched Transformer on WikiText-2 while using many fewer parameters, reaching 5.084 and 5.083 bits-per-byte versus 5.124. The main negative result is equally important: on associative recall, the original UniMatrix family remains near chance while the Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
