Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment

Jea Kwon; Luiz Felipe Vecchietti; Sungwon Park; Meeyoung Cha

arXiv:2511.13290·cs.AI·November 18, 2025

Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment

Jea Kwon, Luiz Felipe Vecchietti, Sungwon Park, Meeyoung Cha

PDF

Open Access

TL;DR

This paper investigates how uncertainty affects moral decision-making in large language models, revealing that increasing model uncertainty through dropout improves alignment with human moral judgments in ethical dilemmas.

Contribution

It introduces a method to quantify and manipulate moral uncertainty in LLMs using dropout, enhancing their alignment with human moral preferences.

Findings

01

Model architecture influences moral uncertainty more than moral dimensions.

02

Dropout increases model entropy, especially mutual information.

03

Higher mutual information correlates with better human-LLM moral alignment.

Abstract

Humans display significant uncertainty when confronted with moral dilemmas, yet the extent of such uncertainty in machines and AI agents remains underexplored. Recent studies have confirmed the overly confident tendencies of machine-generated responses, particularly in large language models (LLMs). As these systems are increasingly embedded in ethical decision-making scenarios, it is important to understand their moral reasoning and the inherent uncertainties in building reliable AI systems. This work examines how uncertainty influences moral decisions in the classical trolley problem, analyzing responses from 32 open-source models and 9 distinct moral dimensions. We first find that variance in model confidence is greater across models than within moral dimensions, suggesting that moral uncertainty is predominantly shaped by model architecture and training method. To quantify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning