A Comparative Analysis on Ethical Benchmarking in Large Language Models
Kira Sam, Raja Vavekanand

TL;DR
This paper introduces two new ethical benchmarking datasets for large language models, emphasizing real-world dilemmas and worst-case scenarios to improve robustness and ecological validity in evaluating AI ethics.
Contribution
The authors present two novel ME benchmarks, Triage and MedLaw, with the MedLaw being fully AI-generated, and incorporate context perturbations to assess model robustness against worst-case ethical challenges.
Findings
Ethics prompting does not consistently improve decision quality.
Context perturbations significantly decrease model performance.
Worst-case performance does not always correlate with general capabilities.
Abstract
This work contributes to the field of Machine Ethics (ME) benchmarking, which develops tests to assess whether intelligent systems accurately represent human values and act accordingly. We identify three major issues with current ME benchmarks: limited ecological validity due to unrealistic ethical dilemmas, unstructured question generation without clear inclusion/exclusion criteria, and a lack of scalability due to reliance on human annotations. Moreover, benchmarks often fail to include sufficient syntactic variations, reducing the robustness of findings. To address these gaps, we introduce two new ME benchmarks: the Triage Benchmark and the Medical Law (MedLaw) Benchmark, both featuring real-world ethical dilemmas from the medical domain. The MedLaw Benchmark, fully AI-generated, offers a scalable alternative. We also introduce context perturbations in our benchmarks to assess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Ethics and Social Impacts of AI · Computational and Text Analysis Methods
