TL;DR
This paper critiques current reinforcement learning ethics approaches and proposes a virtue-based framework emphasizing stable moral traits, multi-objective formulations, and cultural diversity in ethical RL.
Contribution
It introduces a virtue-focused alternative to rule-based and scalar reward methods, outlining a roadmap with four innovative components for ethical reinforcement learning.
Findings
Highlights limitations of rule-based and reward-based ethics in RL.
Proposes a virtue-oriented approach emphasizing stability and cultural diversity.
Outlines a roadmap with four key components for developing ethical RL systems.
Abstract
This paper critiques common patterns in machine ethics for Reinforcement Learning (RL) and argues for a virtue focused alternative. We highlight two recurring limitations in much of the current literature: (i) rule based (deontological) methods that encode duties as constraints or shields often struggle under ambiguity and nonstationarity and do not cultivate lasting habits, and (ii) many reward based approaches, especially single objective RL, implicitly compress diverse moral considerations into a single scalar signal, which can obscure trade offs and invite proxy gaming in practice. We instead treat ethics as policy level dispositions, that is, relatively stable habits that hold up when incentives, partners, or contexts change. This shifts evaluation beyond rule checks or scalar returns toward trait summaries, durability under interventions, and explicit reporting of moral trade…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
