Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models

Zhining Liu; Tianyi Wang; Xiao Lin; Penghao Ouyang; Gaotang Li; Ze Yang; Hui Liu; Sumit Keswani; Vishwa Pardeshi; Huijun Zhao; Wei Fan; Hanghang Tong

arXiv:2601.17082·cs.CY·January 27, 2026

Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models

Zhining Liu, Tianyi Wang, Xiao Lin, Penghao Ouyang, Gaotang Li, Ze Yang, Hui Liu, Sumit Keswani, Vishwa Pardeshi, Huijun Zhao, Wei Fan, Hanghang Tong

PDF

Open Access

TL;DR

This paper investigates the stability of moral judgments in Vision-Language Models (VLMs) under various perturbations, revealing their high fragility and the need for robustness for responsible AI deployment.

Contribution

It introduces the concept of moral robustness in VLMs, systematically evaluates their vulnerabilities, and proposes lightweight interventions to improve moral stability.

Findings

01

VLMs' moral judgments are highly fragile under simple perturbations.

02

Stronger instruction-following models are more susceptible to persuasion.

03

Lightweight interventions can partially restore moral stability.

Abstract

Despite substantial efforts toward improving the moral alignment of Vision-Language Models (VLMs), it remains unclear whether their ethical judgments are stable in realistic settings. This work studies moral robustness in VLMs, defined as the ability to preserve moral judgments under textual and visual perturbations that do not alter the underlying moral context. We systematically probe VLMs with a diverse set of model-agnostic multimodal perturbations and find that their moral stances are highly fragile, frequently flipping under simple manipulations. Our analysis reveals systematic vulnerabilities across perturbation types, moral domains, and model scales, including a sycophancy trade-off where stronger instruction-following models are more susceptible to persuasion. We further show that lightweight inference-time interventions can partially restore moral stability. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Ethics and Social Impacts of AI · Topic Modeling