Scaling Reasoning without Attention

Xueliang Zhao; Wei Wu; Lingpeng Kong

arXiv:2505.22425·cs.LG·May 29, 2025

Scaling Reasoning without Attention

Xueliang Zhao, Wei Wu, Lingpeng Kong

PDF

Open Access 2 Models

TL;DR

This paper introduces \\ourmodel, an attention-free language model built on SSD layers that achieves efficient, fixed-memory inference and outperforms larger models on reasoning benchmarks through a novel curriculum fine-tuning approach.

Contribution

The paper presents a new attention-free language model based on SSD layers and a curriculum fine-tuning strategy for complex reasoning tasks, demonstrating superior performance.

Findings

01

ourmodel-7B surpasses comparable Transformer models on reasoning benchmarks.

02

ourmodel-7B outperforms larger Gemma3-27B model on AIME and Livecodebench.

03

The model achieves fixed-memory, constant-time inference without self-attention.

Abstract

Large language models (LLMs) have made significant advances in complex reasoning tasks, yet they remain bottlenecked by two core challenges: architectural inefficiency due to reliance on Transformers, and a lack of structured fine-tuning for high-difficulty domains. We introduce \ourmodel, an attention-free language model that addresses both issues through architectural and data-centric innovations. Built on the state space dual (SSD) layers of Mamba-2, our model eliminates the need for self-attention and key-value caching, enabling fixed-memory, constant-time inference. To train it for complex reasoning, we propose a two-phase curriculum fine-tuning strategy based on the \textsc{PromptCoT} synthesis paradigm, which generates pedagogically structured problems via abstract concept selection and rationale-guided generation. On benchmark evaluations, \ourmodel-7B outperforms strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)