PhyCritic: Multimodal Critic Models for Physical AI

Tianyi Xiong; Shihao Wang; Guilin Liu; Yi Dong; Ming Li; Heng Huang; Jan Kautz; Zhiding Yu

arXiv:2602.11124·cs.CV·February 12, 2026

PhyCritic: Multimodal Critic Models for Physical AI

Tianyi Xiong, Shihao Wang, Guilin Liu, Yi Dong, Ming Li, Heng Huang, Jan Kautz, Zhiding Yu

PDF

Open Access

TL;DR

PhyCritic is a multimodal critic model designed for physical AI tasks, enhancing perception, reasoning, and judgment stability through a specialized two-stage training pipeline, outperforming existing models on relevant benchmarks.

Contribution

Introduces PhyCritic, a novel two-stage training pipeline for multimodal critics tailored to physical AI, improving judgment accuracy and physical reasoning capabilities.

Findings

01

Achieves significant performance improvements over baselines.

02

Enhances perception and reasoning in physical AI tasks.

03

Improves policy model performance in physically grounded tasks.

Abstract

With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existing critics are primarily trained in general visual domains such as captioning or image question answering, leaving physical AI tasks involving perception, causal reasoning, and planning largely underexplored. We introduce PhyCritic, a multimodal critic model optimized for physical AI through a two-stage RLVR pipeline: a physical skill warmup stage that enhances physically oriented perception and reasoning, followed by self-referential critic finetuning, where the critic generates its own prediction as an internal reference before judging candidate responses, improving judgment stability and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling