LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in Scientific Laboratories
Qianpu Sun, Xiaowei Chi, Yuhan Rui, Ying Li, Kuangzhi Ge, Jiajun Li, Sirui Han, and Shanghang Zhang

TL;DR
LABSHIELD introduces a comprehensive benchmark for evaluating multimodal large language models in safety-critical laboratory scenarios, emphasizing hazard identification and safety reasoning aligned with OSHA standards.
Contribution
This work presents the first detailed safety benchmark for embodied AI in laboratories, including a taxonomy, diverse tasks, and evaluation of multiple models to assess safety reasoning capabilities.
Findings
Models show a 32% performance drop in safety tasks compared to general accuracy.
Significant gaps exist in hazard interpretation and safety-aware planning.
Current models lack sufficient safety-centric reasoning in high-stakes environments.
Abstract
Artificial intelligence is increasingly catalyzing scientific automation, with multimodal large language model (MLLM) agents evolving from lab assistants into self-driving lab operators. This transition imposes stringent safety requirements on laboratory environments, where fragile glassware, hazardous substances, and high-precision laboratory equipment render planning errors or misinterpreted risks potentially irreversible. However, the safety awareness and decision-making reliability of embodied agents in such high-stakes settings remain insufficiently defined and evaluated. To bridge this gap, we introduce LABSHIELD, a realistic multi-view benchmark designed to assess MLLMs in hazard identification and safety-critical reasoning. Grounded in U.S. Occupational Safety and Health Administration (OSHA) standards and the Globally Harmonized System (GHS), LABSHIELD establishes a rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Topic Modeling
