MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning

Zhiyu An; Wan Du

arXiv:2511.12271·cs.AI·November 18, 2025

MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning

Zhiyu An, Wan Du

PDF

Open Access 1 Video

TL;DR

This paper introduces a method to train large language models to consistently apply specific moral reasoning frameworks to new, unseen scenarios, advancing AI moral alignment and decision-making capabilities.

Contribution

It presents a novel dataset and reinforcement learning approach for enabling LLMs to generalize moral reasoning across diverse, out-of-distribution scenarios.

Findings

01

Significant improvement in moral alignment scores for unseen scenarios.

02

Demonstrated generalization across utilitarian and deontological frameworks.

03

Identified training challenges and future research directions.

Abstract

Large language models are increasingly influencing human moral decisions, yet current approaches focus primarily on evaluating rather than actively steering their moral decisions. We formulate this as an out-of-distribution moral alignment problem, where LLM agents must learn to apply consistent moral reasoning frameworks to scenarios beyond their training distribution. We introduce Moral-Reason-QA, a novel dataset extending 680 human-annotated, high-ambiguity moral scenarios with framework-specific reasoning traces across utilitarian, deontological, and virtue ethics, enabling systematic evaluation of moral generalization in realistic decision contexts. Our learning approach employs Group Relative Policy Optimization with composite rewards that simultaneously optimize decision alignment and framework-specific reasoning processes to facilitate learning of the underlying moral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MoralReason: Generalizable Moral Decision Alignment for LLM Agents Using Reasoning-Level Reinforcement Learning· underline

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning