One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning

Yiyuan Li; Zhen Huang; Yanan Wu; Weixun Wang; Xuefeng Li; Yijia Luo; Wenbo Su; Bo Zheng; Pengfei Liu

arXiv:2601.03111·cs.LG·April 3, 2026

One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning

Yiyuan Li, Zhen Huang, Yanan Wu, Weixun Wang, Xuefeng Li, Yijia Luo, Wenbo Su, Bo Zheng, Pengfei Liu

PDF

1 Repo

TL;DR

This paper demonstrates that a single, well-designed training sample can significantly improve large language models' reasoning across multiple disciplines, challenging the need for large data volumes.

Contribution

It introduces polymath learning, a framework for one-shot reinforcement learning that uses a single, strategically engineered sample to enhance multidisciplinary reasoning in LLMs.

Findings

01

A single math reasoning sample improves performance across physics, chemistry, and biology.

02

Salient mathematical skills analysis reveals key characteristics of effective samples.

03

Synthetic multidisciplinary samples outperform natural samples in reasoning benchmarks.

Abstract

The reasoning ability of large language models (LLMs) can be unleashed with reinforcement learning (RL) (OpenAI, 2024; DeepSeek-AI et al., 2025a; Zeng et al., 2025). The success of existing RL attempts in LLMs usually rely on high-quality samples of large volumes. In this paper, we challenge conventional assumptions about data requirements in RL for LLMs by demonstrating the effectiveness of one-shot reinforcement learning. Specifically, we introduce polymath learning, a framework for designing one training sample that elicits multidisciplinary reasoning improvement. We present three key findings: (1) A single, strategically selected math reasoning sample can produce significant performance improvements across multiple domains, including physics, chemistry, and biology; (2) Analysis of salient mathematical skills provides insight into the characteristics associated with effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gair-nlp/polymath-learning
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.