Geometric Policy Iteration for Markov Decision Processes
Yue Wu, Jes\'us A. De Loera

TL;DR
This paper introduces Geometric Policy Iteration (GPI), a novel algorithm leveraging the geometric structure of the value function polytope in MDPs to improve convergence speed and allow flexible asynchronous updates.
Contribution
The paper characterizes the value function polytope using hyperplane arrangements and develops GPI, which updates policies based on boundary points, achieving optimal complexity bounds and empirical performance.
Findings
GPI achieves the best known policy iteration complexity bound.
GPI outperforms traditional methods on various MDP sizes.
GPI allows asynchronous state value updates, enhancing flexibility.
Abstract
Recently discovered polyhedral structures of the value function for finite state-action discounted Markov decision processes (MDP) shed light on understanding the success of reinforcement learning. We investigate the value function polytope in greater detail and characterize the polytope boundary using a hyperplane arrangement. We further show that the value space is a union of finitely many cells of the same hyperplane arrangement and relate it to the polytope of the classical linear programming formulation for MDPs. Inspired by these geometric properties, we propose a new algorithm, Geometric Policy Iteration (GPI), to solve discounted MDPs. GPI updates the policy of a single state by switching to an action that is mapped to the boundary of the value function polytope, followed by an immediate update of the value function. This new update rule aims at a faster value improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
