Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through   the Lens of Moral Theories?

Jingyan Zhou; Minda Hu; Junan Li; Xiaoying Zhang; Xixin Wu; Irwin; King; Helen Meng

arXiv:2308.15399·cs.CL·July 2, 2024·5 cites

Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

Jingyan Zhou, Minda Hu, Junan Li, Xiaoying Zhang, Xixin Wu, Irwin, King, Helen Meng

PDF

Open Access

TL;DR

This paper introduces a top-down framework for guiding large language models to perform moral reasoning based on established moral theories, aiming to improve explainability and address limitations of data-driven approaches.

Contribution

It proposes a flexible, theory-guided top-down approach for moral reasoning in LLMs, integrating interdisciplinary moral theories and analyzing their alignment with datasets.

Findings

01

Effective framework demonstrated on moral theory datasets

02

Alignment observed between moral theories and existing datasets

03

Analysis reveals potential and flaws in current moral AI resources

Abstract

Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. These approaches have been criticized for overgeneralizing the moral stances of a limited group of annotators and lacking explainability. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research. The theory-guided top-down framework can incorporate various moral theories. Our experiments demonstrate the effectiveness of the proposed framework on datasets derived from moral theories. Furthermore, we show the alignment between different moral theories and existing morality datasets. Our analysis exhibits the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Explainable Artificial Intelligence (XAI)