Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

Mihir Prabhudesai; Aryan Satpathy; Yangmin Li; Zheyang Qin; Nikash Bhardwaj; Amir Zadeh; Chuan Li; Katerina Fragkiadaki; Deepak Pathak

arXiv:2604.11805·cs.LG·April 14, 2026

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak

PDF

2 Repos 9 Models 2 Datasets

TL;DR

This paper demonstrates that physics simulators can generate synthetic data to train large language models, enabling them to perform well on real-world physics reasoning tasks without relying on large-scale QA datasets.

Contribution

The authors introduce a method of using physics simulators as scalable data sources for training LLMs, achieving zero-shot transfer to real physics problems.

Findings

01

Training on synthetic data improves IPhO problem performance by 5-10%.

02

Physics simulators can effectively replace large-scale QA datasets for physical reasoning.

03

Models trained with this method exhibit zero-shot transfer to real-world physics benchmarks.

Abstract

We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning. We generate random scenes in physics engines, create synthetic question-answer pairs from simulated interactions, and train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to real-world physics benchmarks: for example, training solely on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.