MDP Geometry, Normalization and Reward Balancing Solvers
Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch., Paschalidis

TL;DR
This paper introduces a geometric perspective on MDPs with normalization techniques that preserve advantages, leading to new algorithms called Reward Balancing which improve sample complexity for unknown transition probabilities.
Contribution
The paper proposes a novel geometric interpretation of MDPs and develops Reward Balancing algorithms that enhance convergence and sample efficiency in unknown transition scenarios.
Findings
Reward Balancing algorithms preserve advantage structure during transformations.
The methods improve upon existing sample complexity bounds for unknown transition MDPs.
Convergence analysis confirms effectiveness of the proposed algorithms.
Abstract
We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms
