When Robots Say No: The Empathic Ethical Disobedience Benchmark
Dmytro Kuzmenko, Nadiya Shvai

TL;DR
The paper introduces the Empathic Ethical Disobedience (EED) Gym, a standardized benchmark for evaluating robot refusal behaviors that balance safety and social trust, incorporating diverse scenarios and trust models.
Contribution
It presents the EED Gym, a novel testbed for systematically assessing ethical disobedience in robots through multiple scenarios, metrics, and grounded trust models.
Findings
Explanatory refusals help maintain trust.
Action masking prevents unsafe compliance.
Constructive and empathic styles influence trustworthiness.
Abstract
Robots must balance compliance with safety and social expectations as blind obedience can cause harm, while over-refusal erodes trust. Existing safe reinforcement learning (RL) benchmarks emphasize physical hazards, while human-robot interaction trust studies are small-scale and hard to reproduce. We present the Empathic Ethical Disobedience (EED) Gym, a standardized testbed that jointly evaluates refusal safety and social acceptability. Agents weigh risk, affect, and trust when choosing to comply, refuse (with or without explanation), clarify, or propose safer alternatives. EED Gym provides different scenarios, multiple persona profiles, and metrics for safety, calibration, and refusals, with trust and blame models grounded in a vignette study. Using EED Gym, we find that action masking eliminates unsafe compliance, while explanatory refusals help sustain trust. Constructive styles are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Ethics and Social Impacts of AI · AI in Service Interactions
