TL;DR
This paper demonstrates that a single, well-designed training sample can significantly improve large language models' reasoning across multiple disciplines, challenging the need for large data volumes.
Contribution
It introduces polymath learning, a framework for one-shot reinforcement learning that uses a single, strategically engineered sample to enhance multidisciplinary reasoning in LLMs.
Findings
A single math reasoning sample improves performance across physics, chemistry, and biology.
Salient mathematical skills analysis reveals key characteristics of effective samples.
Synthetic multidisciplinary samples outperform natural samples in reasoning benchmarks.
Abstract
The reasoning ability of large language models (LLMs) can be unleashed with reinforcement learning (RL) (OpenAI, 2024; DeepSeek-AI et al., 2025a; Zeng et al., 2025). The success of existing RL attempts in LLMs usually rely on high-quality samples of large volumes. In this paper, we challenge conventional assumptions about data requirements in RL for LLMs by demonstrating the effectiveness of one-shot reinforcement learning. Specifically, we introduce polymath learning, a framework for designing one training sample that elicits multidisciplinary reasoning improvement. We present three key findings: (1) A single, strategically selected math reasoning sample can produce significant performance improvements across multiple domains, including physics, chemistry, and biology; (2) Analysis of salient mathematical skills provides insight into the characteristics associated with effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
