Towards Safer Mobile Agents: Scalable Generation and Evaluation of Diverse Scenarios for VLMs

Takara Taniguchi; Kuniaki Saito; Atsushi Hashimoto

arXiv:2601.08470·cs.CV·January 14, 2026

Towards Safer Mobile Agents: Scalable Generation and Evaluation of Diverse Scenarios for VLMs

Takara Taniguchi, Kuniaki Saito, Atsushi Hashimoto

PDF

Open Access

TL;DR

This paper introduces HazardForge, a scalable pipeline for generating diverse hazardous scenarios to evaluate vision language models in autonomous systems, revealing significant performance drops in complex, anomalous situations.

Contribution

The paper presents HazardForge, a novel method for synthesizing complex hazardous scenarios for VLM evaluation, and constructs MovSafeBench, a comprehensive benchmark for safety-critical assessment.

Findings

01

VLM performance drops significantly with anomalous objects

02

HazardForge enables scalable generation of hazardous scenarios

03

MovSafeBench covers diverse normal and anomalous object categories

Abstract

Vision Language Models (VLMs) are increasingly deployed in autonomous vehicles and mobile systems, making it crucial to evaluate their ability to support safer decision-making in complex environments. However, existing benchmarks inadequately cover diverse hazardous situations, especially anomalous scenarios with spatio-temporal dynamics. While image editing models are a promising means to synthesize such hazards, it remains challenging to generate well-formulated scenarios that include moving, intrusive, and distant objects frequently observed in the real world. To address this gap, we introduce \textbf{HazardForge}, a scalable pipeline that leverages image editing models to generate these scenarios with layout decision algorithms, and validation modules. Using HazardForge, we construct \textbf{MovSafeBench}, a multiple-choice question (MCQ) benchmark comprising 7,254 images and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robotic Path Planning Algorithms