Learning to Reason in 13 Parameters
John X. Morris, Niloofar Mireshghallah, Mark Ibrahim, Saeed Mahloujifar

TL;DR
This paper introduces TinyLoRA, a method for training reasoning capabilities in large language models using as few as one parameter, achieving high accuracy with minimal parameter updates, especially with reinforcement learning.
Contribution
Proposes TinyLoRA, a scalable low-rank adapter method that enables training reasoning in large models with extremely few parameters, outperforming traditional methods in efficiency.
Findings
Achieves 91% accuracy on GSM8K with only 13 trained parameters.
Recovers 90% of performance improvements while training 1000x fewer parameters.
RL training significantly outperforms supervised fine-tuning in parameter efficiency.
Abstract
Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
