Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
Shan Yang

TL;DR
This paper audits and improves multimodal physics reasoning evaluation pipelines, addressing undetected biases and introducing a new dataset and model that significantly enhance performance on physics olympiad tasks.
Contribution
It identifies evaluation biases, releases a curated multimodal corpus, and proposes Physics-R1, a new model that advances visual physics reasoning performance.
Findings
Audited existing physics reasoning datasets revealing significant overlaps and biases.
Physics-R1 improves accuracy by over 15 percentage points on key physics olympiad benchmarks.
Release of four artifacts including datasets and a reference recipe for future research.
Abstract
We audit the multimodal-physics evaluation pipeline end-to-end and document three undetected construction practices that distort how the field measures vision-language reasoning: train-eval contamination, translation drift, and MCQ saturation. (1) Public training pools (UGPhysics-Train, SciInstruct, MMK12) pass single-stage 5-gram-Jaccard audits with zero hits across all six public physics evals; a three-stage audit (Jaccard -> mxbai-embed-large cosine -> Haiku-4.5 LLM-judge) surfaces 134 near-duplicates and 4,846 paraphrase candidates in SciInstruct alone. (2) A 17-pp Sonnet 4.5 delta on 59 paired Estonian-English olympiad problems (30.5% vs. 13.6%; sign test p=0.011, McNemar p=0.021, paired bootstrap 95% CI [+5.1, +28.9] pp). (3) A 46-pp format-and-novelty gradient on identical Sonnet weights between MCQ (79.7% on PhyX) and open-ended olympiad evaluation (33.4% on PhysOlym-A). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗shanyangmie/physics-r1-seed17-canonical-step63-fsdpmodel
- 🤗shanyangmie/physics-r1-seed23-canonical-step60-fsdpmodel
- 🤗shanyangmie/physics-r1-seed42-v4-step40-fsdpmodel
- 🤗shanyangmie/physics-r1-seed17-v4-step40-fsdpmodel
- 🤗shanyangmie/physics-r1-seed42-v4-step50-fsdpmodel
- 🤗shanyangmie/physics-r1-seed42-v4-step60-fsdpmodel
- 🤗shanyangmie/physics-r1-seed17-v4-step50-fsdpmodel
- 🤗shanyangmie/physics-r1-seed17-v4-step60-fsdpmodel
- 🤗shanyangmie/physics-r1-seed42-v4-step60model· 32 dl32 dl
- 🤗shanyangmie/physics-r1-seed23model· 30 dl30 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
