Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity
Yiran Mao, Madeline G. Reinecke, Markus Kunesch, Edgar A., Du\'e\~nez-Guzm\'an, Ramona Comanescu, Julia Haas, Joel Z. Leibo

TL;DR
This paper proposes a behavior-based method to evaluate artificial moral cognition by measuring agents' sensitivity to costs, revealing differences in moral motivation between agents with different training objectives.
Contribution
It introduces a novel evaluation approach for artificial moral cognition based on cost sensitivity, applicable to both AI agents and humans.
Findings
Agents with other-regarding preferences show less cost sensitivity in helping behavior.
Cost insensitivity correlates with morally-motivated behavior.
The method enables comparison of moral cognition across agents.
Abstract
Is it possible to evaluate the moral cognition of complex artificial agents? In this work, we take a look at one aspect of morality: `doing the right thing for the right reasons.' We propose a behavior-based analysis of artificial moral cognition which could also be applied to humans to facilitate like-for-like comparison. Morally-motivated behavior should persist despite mounting cost; by measuring an agent's sensitivity to this cost, we gain deeper insight into underlying motivations. We apply this evaluation to a particular set of deep reinforcement learning agents, trained by memory-based meta-reinforcement learning. Our results indicate that agents trained with a reward function that includes other-regarding preferences perform helping behavior in a way that is less sensitive to increasing cost than agents trained with more self-interested preferences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychology of Moral and Emotional Judgment · Experimental Behavioral Economics Studies · Adversarial Robustness in Machine Learning
