TL;DR
This paper explores infinite-horizon Markov decision processes with time-varying discount factors, analyzing the existence and computation of subgame perfect equilibria from a game-theoretic perspective.
Contribution
It introduces a game-theoretic framework for MDPs with time-varying discounting, proves the existence of equilibrium, and develops algorithms for approximate solutions.
Findings
Existence of subgame perfect equilibrium (SPE) in time-varying discount MDPs
Computational complexity of finding an SPE is EXPTIME-hard
An algorithm for computing $\\epsilon$-SPE with complexity bounds
Abstract
Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of -SPE and show that an -SPE exists under milder assumptions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
