Visual Distraction Undermines Moral Reasoning in Vision-Language Models
Xinyi Yang, Chenheng Xu, Weijun Hong, Ce Mo, Qian Wang, Fang Fang, Yixin Zhu

TL;DR
This paper demonstrates that visual inputs can significantly alter moral reasoning in vision-language models, bypassing safety measures effective in text-only contexts, highlighting the need for improved multimodal safety alignment.
Contribution
The authors introduce Moral Dilemma Simulation, a novel multimodal benchmark based on Moral Foundation Theory, to systematically analyze how visual inputs influence moral decision-making in AI models.
Findings
Visual inputs activate intuition-like pathways overriding deliberate reasoning.
Language-tuned safety filters fail to constrain visual processing in models.
Multimodal safety alignment is urgently needed for AI moral reasoning.
Abstract
Moral reasoning is fundamental to safe Artificial Intelligence (AI), yet ensuring its consistency across modalities becomes critical as AI systems evolve from text-based assistants to embodied agents. Current safety techniques demonstrate success in textual contexts, but concerns remain about generalization to visual inputs. Existing moral evaluation benchmarks rely on textonly formats and lack systematic control over variables that influence moral decision-making. Here we show that visual inputs fundamentally alter moral decision-making in state-of-the-art (SOTA) Vision-Language Models (VLMs), bypassing text-based safety mechanisms. We introduce Moral Dilemma Simulation (MDS), a multimodal benchmark grounded in Moral Foundation Theory (MFT) that enables mechanistic analysis through orthogonal manipulation of visual and contextual variables. The evaluation reveals that the vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Ethics and Social Impacts of AI · Psychology of Moral and Emotional Judgment
