Tiny Recursive Reasoning with Mamba-2 Attention Hybrid
Wenlong Wang, Fergal Reid

TL;DR
This paper explores replacing Transformer blocks with Mamba-2 hybrid operators in recursive reasoning models, demonstrating improved performance and candidate coverage while maintaining reasoning capabilities.
Contribution
It introduces Mamba-2 hybrid operators into recursive reasoning models, showing they preserve reasoning ability and enhance performance over Transformer-based approaches.
Findings
Mamba-2 hybrid improves pass@2 by +2.0%
Hybrid outperforms at higher K values (+4.75%)
Maintains pass@1 parity with Transformer-based models
Abstract
Recent work on recursive reasoning models like TRM demonstrates that tiny networks (7M parameters) can achieve strong performance on abstract reasoning tasks through latent recursion -- iterative refinement in hidden representation space without emitting intermediate tokens. This raises a natural question about operator choice: Mamba-2's state space recurrence is itself a form of iterative refinement, making it a natural candidate for recursive reasoning -- but does introducing Mamba-2 into the recursive scaffold preserve reasoning capability? We investigate this by replacing the Transformer blocks in TRM with Mamba-2 hybrid operators while maintaining parameter parity (6.83M vs 6.86M parameters). On ARC-AGI-1, we find that the hybrid improves pass@2 (the official metric) by +2.0\% (45.88\% vs 43.88\%) and consistently outperforms at higher K values (+4.75\% at pass@100), whilst…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Constraint Satisfaction and Optimization · Neural Networks and Reservoir Computing
