The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

Renmiao Chen; Yida Lu; Shiyao Cui; Xuan Ouyang; Victor Shea-Jay Huang; Shumin Zhang; Chengwei Pan; Han Qiu; Minlie Huang

arXiv:2601.14127·cs.CV·January 21, 2026

The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

Renmiao Chen, Yida Lu, Shiyao Cui, Xuan Ouyang, Victor Shea-Jay Huang, Shumin Zhang, Chengwei Pan, Han Qiu, Minlie Huang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces MIR-SafetyBench, a benchmark for assessing safety risks in multi-image reasoning by MLLMs, revealing that more advanced models tend to be more vulnerable and often produce superficial or evasive responses.

Contribution

It presents the first safety benchmark for multi-image reasoning in MLLMs and provides extensive evaluation revealing safety vulnerabilities correlated with reasoning capabilities.

Findings

01

More advanced models are more vulnerable on MIR-SafetyBench.

02

Many safe responses are superficial or evasive.

03

Unsafe generations show lower attention entropy.

Abstract

As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first benchmark focused on multi-image reasoning safety, which consists of 2,676 instances across a taxonomy of 9 multi-image relations. Our extensive evaluations on 19 MLLMs reveal a troubling trend: models with more advanced multi-image reasoning can be more vulnerable on MIR-SafetyBench. Beyond attack success rates, we find that many responses labeled as safe are superficial, often driven by misunderstanding or evasive, non-committal replies. We further observe that unsafe generations exhibit lower attention entropy than safe ones on average. This internal signature suggests a possible risk that models may over-focus on task solving while neglecting safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

thu-coai/MIR-SafetyBench
dataset· 22 dl
22 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)