A Comparative Analysis on Ethical Benchmarking in Large Language Models

Kira Sam; Raja Vavekanand

arXiv:2410.19753·cs.CY·November 11, 2024

A Comparative Analysis on Ethical Benchmarking in Large Language Models

Kira Sam, Raja Vavekanand

PDF

Open Access

TL;DR

This paper introduces two new ethical benchmarking datasets for large language models, emphasizing real-world dilemmas and worst-case scenarios to improve robustness and ecological validity in evaluating AI ethics.

Contribution

The authors present two novel ME benchmarks, Triage and MedLaw, with the MedLaw being fully AI-generated, and incorporate context perturbations to assess model robustness against worst-case ethical challenges.

Findings

01

Ethics prompting does not consistently improve decision quality.

02

Context perturbations significantly decrease model performance.

03

Worst-case performance does not always correlate with general capabilities.

Abstract

This work contributes to the field of Machine Ethics (ME) benchmarking, which develops tests to assess whether intelligent systems accurately represent human values and act accordingly. We identify three major issues with current ME benchmarks: limited ecological validity due to unrealistic ethical dilemmas, unstructured question generation without clear inclusion/exclusion criteria, and a lack of scalability due to reliance on human annotations. Moreover, benchmarks often fail to include sufficient syntactic variations, reducing the robustness of findings. To address these gaps, we introduce two new ME benchmarks: the Triage Benchmark and the Medical Law (MedLaw) Benchmark, both featuring real-world ethical dilemmas from the medical domain. The MedLaw Benchmark, fully AI-generated, offers a scalable alternative. We also introduce context perturbations in our benchmarks to assess…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Ethics and Social Impacts of AI · Computational and Text Analysis Methods