A Definition of Happiness for Reinforcement Learning Agents
Mayank Daswani, Jan Leike

TL;DR
This paper proposes a formal definition of happiness for reinforcement learning agents as the temporal difference error, aligning with human empirical research and satisfying key desiderata.
Contribution
It introduces a novel formal definition of happiness for RL agents based on temporal difference error, bridging AI and human happiness research.
Findings
The definition aligns with human empirical findings.
It satisfies most of the proposed desiderata.
Implications for AI and human happiness are discussed.
Abstract
What is happiness for reinforcement learning agents? We seek a formal definition satisfying a list of desiderata. Our proposed definition of happiness is the temporal difference error, i.e. the difference between the value of the obtained reward and observation and the agent's expectation of this value. This definition satisfies most of our desiderata and is compatible with empirical research on humans. We state several implications and discuss examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Neural dynamics and brain function
