Toward Autonomous Laboratory Safety Monitoring with Vision Language Models: Learning to See Hazards Through Scene Structure
Trishna Chakraborty, Udita Ghosh, Aldair Ernesto Gongora, Ruben Glatt, Yue Dong, Jiachen Li, Amit K. Roy-Chowdhury, Chengyu Song

TL;DR
This paper explores using vision language models for autonomous safety monitoring in laboratories, introducing a structured data pipeline and a scene-graph-guided alignment method to improve hazard detection from visual data.
Contribution
It presents a novel data generation pipeline for aligning images, scene graphs, and text, and proposes a scene-graph-guided alignment technique to enhance VLMs' hazard detection capabilities.
Findings
VLMs perform well with textual scene graphs.
Performance drops significantly with visual-only inputs.
Scene-graph-guided alignment improves hazard detection from images.
Abstract
Laboratories are prone to severe injuries from minor unsafe actions, yet continuous safety monitoring -- beyond mandatory pre-lab safety training -- is limited by human availability. Vision language models (VLMs) offer promise for autonomous laboratory safety monitoring, but their effectiveness in realistic settings is unclear due to the lack of visual evaluation data, as most safety incidents are documented primarily as unstructured text. To address this gap, we first introduce a structured data generation pipeline that converts textual laboratory scenarios into aligned triples of (image, scene graph, ground truth), using large language models as scene graph architects and image generation models as renderers. Our experiments on the synthetic dataset of 1,207 samples across 362 unique scenarios and seven open- and closed-source models show that VLMs perform effectively given textual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Materials Science · Data Visualization and Analytics
