Visual Distraction Undermines Moral Reasoning in Vision-Language Models

Xinyi Yang; Chenheng Xu; Weijun Hong; Ce Mo; Qian Wang; Fang Fang; Yixin Zhu

arXiv:2603.16445·cs.AI·March 18, 2026

Visual Distraction Undermines Moral Reasoning in Vision-Language Models

Xinyi Yang, Chenheng Xu, Weijun Hong, Ce Mo, Qian Wang, Fang Fang, Yixin Zhu

PDF

Open Access

TL;DR

This paper demonstrates that visual inputs can significantly alter moral reasoning in vision-language models, bypassing safety measures effective in text-only contexts, highlighting the need for improved multimodal safety alignment.

Contribution

The authors introduce Moral Dilemma Simulation, a novel multimodal benchmark based on Moral Foundation Theory, to systematically analyze how visual inputs influence moral decision-making in AI models.

Findings

01

Visual inputs activate intuition-like pathways overriding deliberate reasoning.

02

Language-tuned safety filters fail to constrain visual processing in models.

03

Multimodal safety alignment is urgently needed for AI moral reasoning.

Abstract

Moral reasoning is fundamental to safe Artificial Intelligence (AI), yet ensuring its consistency across modalities becomes critical as AI systems evolve from text-based assistants to embodied agents. Current safety techniques demonstrate success in textual contexts, but concerns remain about generalization to visual inputs. Existing moral evaluation benchmarks rely on textonly formats and lack systematic control over variables that influence moral decision-making. Here we show that visual inputs fundamentally alter moral decision-making in state-of-the-art (SOTA) Vision-Language Models (VLMs), bypassing text-based safety mechanisms. We introduce Moral Dilemma Simulation (MDS), a multimodal benchmark grounded in Moral Foundation Theory (MFT) that enables mechanistic analysis through orthogonal manipulation of visual and contextual variables. The evaluation reveals that the vision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Ethics and Social Impacts of AI · Psychology of Moral and Emotional Judgment