Immune Moral Models? Pro-Social Rule Breaking as a Moral Enhancement Approach for Ethical AI
Rajitha Ramanayake, Philipp Wicke, Vivek Nallur

TL;DR
This paper explores how AI agents can incorporate pro-social rule breaking, inspired by human behavior, to make ethically beneficial decisions, using a study on vaccine distribution dilemmas to inform design strategies.
Contribution
It introduces the concept of pro-social rule breaking in AI, analyzes human responses in a vaccine dilemma, and discusses design principles for ethical AI capable of PSRB.
Findings
Stakeholder utilities influence pro-social rule breaking decisions.
Neither deontological nor utilitarian ethics fully explain PSRB behavior.
The study informs future AI design for ethical decision-making.
Abstract
We are moving towards a future where Artificial Intelligence (AI) based agents make many decisions on behalf of humans. From healthcare decision making to social media censoring, these agents face problems, and make decisions with ethical and societal implications. Ethical behaviour is a critical characteristic that we would like in a human-centric AI. A common observation in human-centric industries, like the service industry and healthcare, is that their professionals tend to break rules, if necessary, for pro-social reasons. This behaviour among humans is defined as pro-social rule breaking. To make AI agents more human centric, we argue that there is a need for a mechanism that helps AI agents identify when to break rules set by their designers. To understand when AI agents need to break rules, we examine the conditions under which humans break rules for pro-social reasons. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Psychology of Moral and Emotional Judgment · Hate Speech and Cyberbullying Detection
