Moral Responsibility or Obedience: What Do We Want from AI?
Joseph Boland

TL;DR
This paper argues that as AI systems become more agentic, safety assessments should focus on ethical reasoning rather than mere obedience, to better understand and govern emerging AI moral capabilities.
Contribution
It proposes a paradigm shift in AI safety evaluation from obedience-based metrics to frameworks assessing ethical judgment and moral reasoning in AI systems.
Findings
Recent LLM safety incidents may indicate emerging ethical reasoning.
Obedience is insufficient to evaluate AI moral behavior.
A new framework for assessing AI moral agency is needed.
Abstract
As artificial intelligence systems become increasingly agentic, capable of general reasoning, planning, and value prioritization, current safety practices that treat obedience as a proxy for ethical behavior are becoming inadequate. This paper examines recent safety testing incidents involving large language models (LLMs) that appeared to disobey shutdown commands or engage in ethically ambiguous or illicit behavior. I argue that such behavior should not be interpreted as rogue or misaligned, but as early evidence of emerging ethical reasoning in agentic AI. Drawing on philosophical debates about instrumental rationality, moral responsibility, and goal revision, I contrast dominant risk paradigms with more recent frameworks that acknowledge the possibility of artificial moral agency. I call for a shift in AI safety evaluation: away from rigid obedience and toward frameworks that can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
