Discounting the Past
Taylor Dohmen, Ashutosh Trivedi

TL;DR
This paper introduces a novel concept of past-discounting in stochastic games, analyzing its impact on strategy complexity and establishing determinacy results for certain objectives, with implications for game theory and decision-making models.
Contribution
It proposes the concept of past-discounting, studies its effects on game determinacy, and provides reductions to standard models for certain objectives.
Findings
Positional determinacy fails for liminf of past-discounted rewards.
Optimal strategies may require unbounded memory in some cases.
Determinacy holds for discounted and average limits of past-discounted rewards with stationary strategies.
Abstract
Stochastic games with discounted payoff, introduced by Shapley, model adversarial interactions in stochastic environments where two players try to optimize a discounted sum of rewards. In this model, long-term weights are geometrically attenuated based on the delay in their occurrence. We propose a temporally dual notion -- called past-discounting -- where agents have geometrically decaying memory of the rewards encountered during a play of the game. We study objective functions based on past-discounted weight sequences and examine the corresponding stochastic games with liminf, discounted, and mean payoffs. For objectives specified as the limit inferior of past-discounted reward sequences, we show that positional determinacy fails and that optimal strategies may require unbounded memory. To overcome this obstacle, we study an approximate windowed objective based on the idea of using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Reinforcement Learning in Robotics
