PhyCritic: Multimodal Critic Models for Physical AI
Tianyi Xiong, Shihao Wang, Guilin Liu, Yi Dong, Ming Li, Heng Huang, Jan Kautz, Zhiding Yu

TL;DR
PhyCritic is a multimodal critic model designed for physical AI tasks, enhancing perception, reasoning, and judgment stability through a specialized two-stage training pipeline, outperforming existing models on relevant benchmarks.
Contribution
Introduces PhyCritic, a novel two-stage training pipeline for multimodal critics tailored to physical AI, improving judgment accuracy and physical reasoning capabilities.
Findings
Achieves significant performance improvements over baselines.
Enhances perception and reasoning in physical AI tasks.
Improves policy model performance in physically grounded tasks.
Abstract
With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existing critics are primarily trained in general visual domains such as captioning or image question answering, leaving physical AI tasks involving perception, causal reasoning, and planning largely underexplored. We introduce PhyCritic, a multimodal critic model optimized for physical AI through a two-stage RLVR pipeline: a physical skill warmup stage that enhances physically oriented perception and reasoning, followed by self-referential critic finetuning, where the critic generates its own prediction as an internal reference before judging candidate responses, improving judgment stability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
