Prospect-theoretic Q-learning
Vivek S. Borkar, Siddharth Chandak

TL;DR
This paper introduces a prospect-theoretic modification to Q-learning, accounting for distorted reward perception, and analyzes its convergence and equilibrium properties using dynamical systems theory.
Contribution
It presents a novel prospect-theoretic Q-learning algorithm and provides a rigorous analysis of its asymptotic behavior and equilibrium characteristics.
Findings
Convergence to equilibria is established.
Qualitative properties of the equilibria are characterized.
The approach models realistic reward perception distortions.
Abstract
We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point. We analyze the asymptotic behavior of the scheme by analyzing its limiting differential equation and using the theory of monotone dynamical systems to infer its asymptotic behavior. Specifically, we show convergence to equilibria, and establish some qualitative facts about the equilibria themselves.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
