Reinforcement Learning Beyond Expectation
Bhaskar Ramasubramanian, Luyao Niu, Andrew Clark, Radha Poovendran

TL;DR
This paper introduces algorithms for reinforcement learning that optimize a cumulative prospect theory-based cost, enabling autonomous agents to better mimic human-like decision-making and preferences in unknown environments.
Contribution
It develops a CPT-value based framework and algorithms for reinforcement learning, extending traditional expected utility optimization to better align with human behavior.
Findings
Algorithms successfully learned human-aligned behaviors.
Improved performance over baseline expected utility methods.
Effective in obstacle avoidance and target reaching tasks.
Abstract
The inputs and preferences of human users are important considerations in situations where these users interact with autonomous cyber or cyber-physical systems. In these scenarios, one is often interested in aligning behaviors of the system with the preferences of one or more human users. Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently. In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment. In traditional reinforcement learning, these behaviors are learned through repeated interactions with the environment by optimizing an expected utility. In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost. We introduce the notion of the CPT-value of an action taken in a state, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
