PhysBrain 1.0 Technical Report

Shijie Lian; Bin Yu; Xiaopeng Lin; Changti Wu; Hang Yuan; Xiaolin Hu; Zhaolong Shen; Yuzhuo Miao; Haishan Liu; Yuxuan Tian; Yukun Shi; Cong Huang; and Kai Chen

arXiv:2605.15298·cs.RO·May 18, 2026

PhysBrain 1.0 Technical Report

Shijie Lian, Bin Yu, Xiaopeng Lin, Changti Wu, Hang Yuan, Xiaolin Hu, Zhaolong Shen, Yuzhuo Miao, Haishan Liu, Yuxuan Tian, Yukun Shi, Cong Huang, and Kai Chen

PDF

1 Repo 4 Models

TL;DR

PhysBrain 1.0 leverages large-scale egocentric videos to learn physical commonsense, enhancing robot understanding and control across diverse benchmarks with state-of-the-art results.

Contribution

It introduces a novel data engine that converts egocentric videos into structured supervision for training physical priors in vision-language models.

Findings

01

Achieves SOTA on multiple benchmarks including ERQA, PhysBench, and RoboCasa.

02

Demonstrates strong out-of-domain performance, especially on SimplerEnv.

03

Shows that scaling physical commonsense from videos improves robot action understanding.

Abstract

Vision-language-action models have advanced rapidly, but robot trajectories alone provide limited coverage for learning broad physical understanding. PhysBrain 1.0 studies a complementary route: converting large-scale human egocentric video into structured physical commonsense supervision before robot adaptation. Our data engine extracts scene elements, spatial dynamics, action execution, and depth-aware relations, then turns them into question-answer supervision for training PhysBrain VLMs. The resulting physical priors are further transferred to VLA policies through a capability-preserving and language-sensitive adaptation design. Across multimodal QA benchmarks and embodied control benchmarks, including ERQA, PhysBench, SimplerEnv-WidowX, LIBERO, and RoboCasa, PhysBrain 1.0 achieves SOTA results and shows especially strong out-of-domain performance on SimplerEnv. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phys-brain/PhysBrain-VLA
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.