Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition

Dasol Choi; Seunghyun Lee; Youngsook Song

arXiv:2505.15367·cs.CV·September 30, 2025

Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition

Dasol Choi, Seunghyun Lee, Youngsook Song

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper evaluates vision-language models in safety-critical scenarios, revealing a prevalent overreaction bias where models often misclassify safe situations as emergencies, highlighting the need for improved contextual understanding.

Contribution

Introduces VERI, a benchmark for assessing VLMs' reliability in emergency recognition, and uncovers systematic overreaction issues affecting safety-critical applications.

Findings

01

Models have high recall but low precision in emergency detection.

02

All models misclassified seven safe scenarios as dangerous.

03

Overinterpretation of context causes 88-98% of errors.

Abstract

Vision-Language Models (VLMs) have shown capabilities in interpreting visual content, but their reliability in safety-critical scenarios remains insufficiently explored. We introduce VERI, a diagnostic benchmark comprising 200 synthetic images (100 contrastive pairs) and an additional 50 real-world images (25 pairs) for validation. Each emergency scene is paired with a visually similar but safe counterpart through human verification. Using a two-stage evaluation protocol (risk identification and emergency response), we assess 17 VLMs across medical emergencies, accidents, and natural disasters. Our analysis reveals an "overreaction problem": models achieve high recall (70-100%) but suffer from low precision, misclassifying 31-96% of safe situations as dangerous. Seven safe scenarios were universally misclassified by all models. This "better-safe-than-sorry" bias stems from contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Dasol-Choi/VERI-Emergency
pytorch

Datasets

Dasool/VERI-Emergency
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications