Resa: Transparent Reasoning Models via SAEs

Shangshang Wang; Julian Asilis; \"Omer Faruk Akg\"ul; Enes Burak Bilgin; Ollie Liu; Deqing Fu; and Willie Neiswanger

arXiv:2506.09967·cs.CL·June 17, 2025

Resa: Transparent Reasoning Models via SAEs

Shangshang Wang, Julian Asilis, \"Omer Faruk Akg\"ul, Enes Burak Bilgin, Ollie Liu, Deqing Fu, and Willie Neiswanger

PDF

Open Access 1 Repo

TL;DR

Resa introduces a novel SAE-Tuning method to efficiently elicit strong reasoning abilities in language models, significantly reducing training costs while maintaining high performance and enabling modular, generalizable reasoning skills.

Contribution

The paper presents SAE-Tuning, a new approach that captures reasoning abilities from source models and transfers them to target models with minimal cost and training time.

Findings

01

Retains >97% of RL-trained reasoning performance at a fraction of the cost

02

Enables reasoning in lightly RL-trained models for around $1

03

Extracted reasoning abilities are generalizable and modular

Abstract

How cost-effectively can we elicit strong reasoning in language models by leveraging their underlying representations? We answer this question with Resa, a family of 1.5B reasoning models trained via a novel and efficient sparse autoencoder tuning (SAE-Tuning) procedure. This method first trains an SAE to capture reasoning abilities from a source model, and then uses the trained SAE to guide a standard supervised fine-tuning process to elicit such abilities in a target model, all using verified question-answer data without any reasoning traces. Notably, when applied to certain base models before further RL post-training, SAE-Tuning retains >97% of its RL-trained counterpart's reasoning performance while reducing training costs by >2000x to roughly $1 and training time by >450x to around 20 minutes. Furthermore, when applied to lightly RL-trained models (e.g., within 1 hour on 2 GPUs),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shangshang-wang/resa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsBalanced Selection · Sparse Autoencoder