Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants
Alessio Galatolo, Luca Alberto Rappuoli, Katie Winkle, Meriem Beloucif

TL;DR
This paper introduces a new framework and benchmark to evaluate large language models as Artificial Moral Assistants, emphasizing their capacity for explicit moral reasoning beyond superficial alignment, and reveals significant variability and shortcomings in current models.
Contribution
It develops a formal philosophical framework and a corresponding benchmark to assess LLMs' moral reasoning abilities as AMAs, highlighting areas for improvement.
Findings
Models show variability in moral reasoning capabilities.
Persistent shortcomings in abductive moral reasoning.
Need for dedicated strategies to improve moral reasoning in LLMs.
Abstract
The recent rise in popularity of large language models (LLMs) has prompted considerable concerns about their moral capabilities. Although considerable effort has been dedicated to aligning LLMs with human moral values, existing benchmarks and evaluations remain largely superficial, typically measuring alignment based on final ethical verdicts rather than explicit moral reasoning. In response, this paper aims to advance the investigation of LLMs' moral capabilities by examining their capacity to function as Artificial Moral Assistants (AMAs), systems envisioned in the philosophical literature to support human moral deliberation. We assert that qualifying as an AMA requires more than what state-of-the-art alignment techniques aim to achieve: not only must AMAs be able to discern ethically problematic situations, they should also be able to actively reason about them, navigating between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Law, AI, and Intellectual Property · Ethics and Social Impacts of AI
