P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

Yun Luo; Futing Wang; Qianjia Cheng; Fangchen Yu; Haodi Lei; Jianhao Yan; Chenxi Li; Jiacheng Chen; Yufeng Zhao; Haiyuan Wan; Yuchen Zhang; Shenghe Zheng; Junchi Yao; Qingyang Zhang; Haonan He; Wenxuan Zeng; Li Sheng; Chengxing Xie; Yuxin Zuo; Yizhuo Li; Yulun Wu; Rui Huang; Dongzhan Zhou; Kai Chen; Yu Qiao; Lei Bai; Yu Cheng; Ning Ding; Bowen Zhou; Peng Ye; Ganqu Cui

arXiv:2602.09443·cs.AI·February 11, 2026

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

Yun Luo, Futing Wang, Qianjia Cheng, Fangchen Yu, Haodi Lei, Jianhao Yan, Chenxi Li, Jiacheng Chen, Yufeng Zhao, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Wenxuan Zeng, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Yulun Wu, Rui Huang

PDF

Open Access 2 Models

TL;DR

P1-VL is an open-source vision-language model designed for advanced scientific reasoning, particularly in physics, integrating multimodal perception and iterative self-verification to excel in Olympiad-level problems and benchmarks.

Contribution

It introduces P1-VL, combining curriculum reinforcement learning and agentic augmentation, achieving state-of-the-art performance and first-place medals in physics Olympiad benchmarks.

Findings

01

First open-source VLM to win 12 gold medals in physics Olympiads.

02

Achieves state-of-the-art performance among open-source models.

03

Ranks second globally in scientific reasoning benchmarks.

Abstract

The transition from symbolic manipulation to science-grade reasoning represents a pivotal frontier for Large Language Models (LLMs), with physics serving as the critical test anchor for binding abstract logic to physical reality. Physics demands that a model maintain physical consistency with the laws governing the universe, a task that fundamentally requires multimodal perception to ground abstract logic in reality. At the Olympiad level, diagrams are often constitutive rather than illustrative, containing essential constraints, such as boundary conditions and spatial symmetries, that are absent from the text. To bridge this visual-logical gap, we introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning. Our method harmonizes Curriculum Reinforcement Learning, which employs progressive difficulty expansion to stabilize post-training,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling