Prospect-theoretic Q-learning

Vivek S. Borkar; Siddharth Chandak

arXiv:2104.05311·eess.SY·September 2, 2021

Prospect-theoretic Q-learning

Vivek S. Borkar, Siddharth Chandak

PDF

TL;DR

This paper introduces a prospect-theoretic modification to Q-learning, accounting for distorted reward perception, and analyzes its convergence and equilibrium properties using dynamical systems theory.

Contribution

It presents a novel prospect-theoretic Q-learning algorithm and provides a rigorous analysis of its asymptotic behavior and equilibrium characteristics.

Findings

01

Convergence to equilibria is established.

02

Qualitative properties of the equilibria are characterized.

03

The approach models realistic reward perception distortions.

Abstract

We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point. We analyze the asymptotic behavior of the scheme by analyzing its limiting differential equation and using the theory of monotone dynamical systems to infer its asymptotic behavior. Specifically, we show convergence to equilibria, and establish some qualitative facts about the equilibria themselves.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning