Towards Safer Mobile Agents: Scalable Generation and Evaluation of Diverse Scenarios for VLMs
Takara Taniguchi, Kuniaki Saito, Atsushi Hashimoto

TL;DR
This paper introduces HazardForge, a scalable pipeline for generating diverse hazardous scenarios to evaluate vision language models in autonomous systems, revealing significant performance drops in complex, anomalous situations.
Contribution
The paper presents HazardForge, a novel method for synthesizing complex hazardous scenarios for VLM evaluation, and constructs MovSafeBench, a comprehensive benchmark for safety-critical assessment.
Findings
VLM performance drops significantly with anomalous objects
HazardForge enables scalable generation of hazardous scenarios
MovSafeBench covers diverse normal and anomalous object categories
Abstract
Vision Language Models (VLMs) are increasingly deployed in autonomous vehicles and mobile systems, making it crucial to evaluate their ability to support safer decision-making in complex environments. However, existing benchmarks inadequately cover diverse hazardous situations, especially anomalous scenarios with spatio-temporal dynamics. While image editing models are a promising means to synthesize such hazards, it remains challenging to generate well-formulated scenarios that include moving, intrusive, and distant objects frequently observed in the real world. To address this gap, we introduce \textbf{HazardForge}, a scalable pipeline that leverages image editing models to generate these scenarios with layout decision algorithms, and validation modules. Using HazardForge, we construct \textbf{MovSafeBench}, a multiple-choice question (MCQ) benchmark comprising 7,254 images and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robotic Path Planning Algorithms
