Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak

TL;DR
This paper demonstrates that physics simulators can generate synthetic data to train large language models, enabling them to perform well on real-world physics reasoning tasks without relying on large-scale QA datasets.
Contribution
The authors introduce a method of using physics simulators as scalable data sources for training LLMs, achieving zero-shot transfer to real physics problems.
Findings
Training on synthetic data improves IPhO problem performance by 5-10%.
Physics simulators can effectively replace large-scale QA datasets for physical reasoning.
Models trained with this method exhibit zero-shot transfer to real-world physics benchmarks.
Abstract
We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning. We generate random scenes in physics engines, create synthetic question-answer pairs from simulated interactions, and train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to real-world physics benchmarks: for example, training solely on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗asatpath/Sim2Reason-3Bmodel· 56 dl· ♡ 156 dl♡ 1
- 🤗asatpath/Sim2Reason-7Bmodel· 37 dl· ♡ 137 dl♡ 1
- 🤗asatpath/Sim2Reason-14Bmodel· 55 dl· ♡ 155 dl♡ 1
- 🤗asatpath/Sim2Reason-30Bmodel· 18 dl18 dl
- 🤗mradermacher/Sim2Reason-3B-GGUFmodel· 266 dl· ♡ 1266 dl♡ 1
- 🤗mradermacher/Sim2Reason-7B-GGUFmodel· 232 dl· ♡ 1232 dl♡ 1
- 🤗mradermacher/Sim2Reason-14B-GGUFmodel· 437 dl· ♡ 2437 dl♡ 2
- 🤗mradermacher/Sim2Reason-30B-GGUFmodel· 348 dl348 dl
- 🤗asatpath/Sim2Reason-32Bmodel· 38 dl38 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
