Are LLMs complicated ethical dilemma analyzers?

Jiashen (Jason) Du; Jesse Yao; Allen Liu; Zhekai Zhang

arXiv:2505.08106·cs.CL·May 14, 2025

Are LLMs complicated ethical dilemma analyzers?

Jiashen (Jason) Du, Jesse Yao, Allen Liu, Zhekai Zhang

PDF

1 Repo

TL;DR

This paper evaluates whether large language models can emulate human ethical reasoning by benchmarking their responses against expert opinions on real-world dilemmas, revealing strengths and limitations in their moral judgment capabilities.

Contribution

Introduces a novel benchmark dataset and a comprehensive evaluation framework for assessing LLMs' ethical reasoning against expert responses.

Findings

01

LLMs outperform non-experts in lexical and structural similarity

02

GPT-4o-mini shows the most consistent performance

03

Models struggle with historical grounding and nuanced resolution strategies

Abstract

One open question in the study of Large Language Models (LLMs) is whether they can emulate human ethical reasoning and act as believable proxies for human judgment. To investigate this, we introduce a benchmark dataset comprising 196 real-world ethical dilemmas and expert opinions, each segmented into five structured components: Introduction, Key Factors, Historical Theoretical Perspectives, Resolution Strategies, and Key Takeaways. We also collect non-expert human responses for comparison, limited to the Key Factors section due to their brevity. We evaluate multiple frontier LLMs (GPT-4o-mini, Claude-3.5-Sonnet, Deepseek-V3, Gemini-1.5-Flash) using a composite metric framework based on BLEU, Damerau-Levenshtein distance, TF-IDF cosine similarity, and Universal Sentence Encoder similarity. Metric weights are computed through an inversion-based ranking alignment and pairwise AHP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alt-js/ethicallm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.