Learning to Reason in 13 Parameters

John X. Morris; Niloofar Mireshghallah; Mark Ibrahim; Saeed Mahloujifar

arXiv:2602.04118·cs.LG·February 5, 2026

Learning to Reason in 13 Parameters

John X. Morris, Niloofar Mireshghallah, Mark Ibrahim, Saeed Mahloujifar

PDF

Open Access 1 Models

TL;DR

This paper introduces TinyLoRA, a method for training reasoning capabilities in large language models using as few as one parameter, achieving high accuracy with minimal parameter updates, especially with reinforcement learning.

Contribution

Proposes TinyLoRA, a scalable low-rank adapter method that enables training reasoning in large models with extremely few parameters, outperforming traditional methods in efficiency.

Findings

01

Achieves 91% accuracy on GSM8K with only 13 trained parameters.

02

Recovers 90% of performance improvements while training 1000x fewer parameters.

03

RL training significantly outperforms supervised fine-tuning in parameter efficiency.

Abstract

Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training $1000 x$ fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
neopolita/TinyLoRA-TexasHoldEm-Llama-3.2-1B-Instruct
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications