Some Issues in Predictive Ethics Modeling: An Annotated Contrast Set of "Moral Stories"
Ben Fitzgerald

TL;DR
This paper critically examines the limitations of current ethics models like Delphi by showing how small textual changes can drastically reduce classifier accuracy, highlighting overfitting and the need for better data handling.
Contribution
It provides the first concrete estimates of accuracy loss due to data misrepresentation in ethics modeling and offers practical recommendations for improving model robustness.
Findings
Small textual tweaks can reduce accuracy from 99.8% to 51%.
Misleading social norms lower accuracy to 98.8%.
Textual bias decreases accuracy to 77%.
Abstract
Models like Delphi have been able to label ethical dilemmas as moral or immoral with astonishing accuracy. This paper challenges accuracy as a holistic metric for ethics modeling by identifying issues with translating moral dilemmas into text-based input. It demonstrates these issues with contrast sets that substantially reduce the performance of classifiers trained on the dataset Moral Stories. Ultimately, we obtain concrete estimates for how much specific forms of data misrepresentation harm classifier accuracy. Specifically, label-changing tweaks to the descriptive content of a situation (as small as 3-5 words) can reduce classifier accuracy to as low as 51%, almost half the initial accuracy of 99.8%. Associating situations with a misleading social norm lowers accuracy to 98.8%, while adding textual bias (i.e. an implication that a situation already fits a certain label) lowers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, Economics, and Judicial Systems
