When Robots Say No: The Empathic Ethical Disobedience Benchmark

Dmytro Kuzmenko; Nadiya Shvai

arXiv:2512.18474·cs.RO·March 25, 2026

When Robots Say No: The Empathic Ethical Disobedience Benchmark

Dmytro Kuzmenko, Nadiya Shvai

PDF

Open Access

TL;DR

The paper introduces the Empathic Ethical Disobedience (EED) Gym, a standardized benchmark for evaluating robot refusal behaviors that balance safety and social trust, incorporating diverse scenarios and trust models.

Contribution

It presents the EED Gym, a novel testbed for systematically assessing ethical disobedience in robots through multiple scenarios, metrics, and grounded trust models.

Findings

01

Explanatory refusals help maintain trust.

02

Action masking prevents unsafe compliance.

03

Constructive and empathic styles influence trustworthiness.

Abstract

Robots must balance compliance with safety and social expectations as blind obedience can cause harm, while over-refusal erodes trust. Existing safe reinforcement learning (RL) benchmarks emphasize physical hazards, while human-robot interaction trust studies are small-scale and hard to reproduce. We present the Empathic Ethical Disobedience (EED) Gym, a standardized testbed that jointly evaluates refusal safety and social acceptability. Agents weigh risk, affect, and trust when choosing to comply, refuse (with or without explanation), clarify, or propose safer alternatives. EED Gym provides different scenarios, multiple persona profiles, and metrics for safety, calibration, and refusals, with trust and blame models grounded in a vignette study. Using EED Gym, we find that action masking eliminates unsafe compliance, while explanatory refusals help sustain trust. Constructive styles are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Ethics and Social Impacts of AI · AI in Service Interactions