Examining average and discounted reward optimality criteria in reinforcement learning
Vektor Dewanto, Marcus Gallagher

TL;DR
This paper explores the fundamental differences and relationships between average and discounted reward criteria in reinforcement learning, highlighting the advantages of directly optimizing average rewards without artificial discounting.
Contribution
It provides a comprehensive analysis of the connection between average and discounted rewards and advocates for discounting-free methods in RL.
Findings
Average reward criteria can be directly optimized in RL.
Discounting-free RL methods are feasible and beneficial.
The relationship between average and discounted rewards is clarified.
Abstract
In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards. While the latter is more popular, it is problematic to apply in environments without an inherent notion of discounting. This motivates us to revisit a) the progression of optimality criteria in dynamic programming, b) justification for and complication of an artificial discount factor, and c) benefits of directly maximizing the average reward criterion, which is discounting-free. Our contributions include a thorough examination of the relationship between average and discounted rewards, as well as a discussion of their pros and cons in RL. We emphasize that average-reward RL methods possess the ingredient and mechanism for applying a family of discounting-free optimality criteria…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Decision-Making and Behavioral Economics · Auction Theory and Applications
