Reinforcement Learning with Non-Exponential Discounting
Matthias Schultheis, Constantin A. Rothkopf, Heinz Koeppl

TL;DR
This paper develops a continuous-time reinforcement learning framework that accommodates arbitrary discount functions, including hyperbolic discounting, using a Hamilton-Jacobi-Bellman equation and deep learning methods.
Contribution
It introduces a generalized RL theory for non-exponential discounting, deriving an HJB equation and a collocation-based solution approach, and explores inverse RL for discount function recovery.
Findings
Validated on two simulated problems.
Demonstrated applicability to non-exponential discounting.
Provided a method for analyzing human decision-making patterns.
Abstract
Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDecision-Making and Behavioral Economics
