MDP Geometry, Normalization and Reward Balancing Solvers

Arsenii Mustafin; Aleksei Pakharev; Alex Olshevsky; Ioannis Ch.; Paschalidis

arXiv:2407.06712·cs.LG·March 6, 2025

MDP Geometry, Normalization and Reward Balancing Solvers

Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch., Paschalidis

PDF

Open Access

TL;DR

This paper introduces a geometric perspective on MDPs with normalization techniques that preserve advantages, leading to new algorithms called Reward Balancing which improve sample complexity for unknown transition probabilities.

Contribution

The paper proposes a novel geometric interpretation of MDPs and develops Reward Balancing algorithms that enhance convergence and sample efficiency in unknown transition scenarios.

Findings

01

Reward Balancing algorithms preserve advantage structure during transformations.

02

The methods improve upon existing sample complexity bounds for unknown transition MDPs.

03

Convergence analysis confirms effectiveness of the proposed algorithms.

Abstract

We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms