An Analysis of Primal-Dual Algorithms for Discounted Markov Decision Processes
Randy Cogill

TL;DR
This paper explores primal-dual algorithms for discounted Markov decision processes, deriving an optimal solution to the dual of the restricted primal, leading to a finite-iteration algorithm that guarantees optimality and relates to policy iteration.
Contribution
It introduces a new primal-dual algorithm for discounted MDPs with a closed-form dual solution, ensuring finite convergence to the optimal value function.
Findings
The derived algorithm guarantees optimality in finite steps.
The primal-dual method can be interpreted as repeated policy iteration.
Connections are made between primal-dual algorithms and policy iteration complexity.
Abstract
Several well-known algorithms in the field of combinatorial optimization can be interpreted in terms of the primal-dual method for solving linear programs. For example, Dijkstra's algorithm, the Ford-Fulkerson algorithm, and the Hungarian algorithm can all be viewed as the primal-dual method applied to the linear programming formulations of their respective optimization problems. Roughly speaking, successfully applying the primal-dual method to an optimization problem that can be posed as a linear program relies on the ability to find a simple characterization of the optimal solutions to a related linear program, called the `dual of the restricted primal' (DRP). This paper is motivated by the following question: What is the algorithm we obtain if we apply the primal-dual method to a linear programming formulation of a discounted cost Markov decision process? We will first show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Optimization and Search Problems
