Policy Optimization over General State and Action Spaces
Caleb Ju, Guanghui Lan

TL;DR
This paper develops new policy optimization algorithms for reinforcement learning in general state and action spaces, with convergence guarantees and practical robustness, extending existing methods to more complex environments.
Contribution
It introduces generalized policy mirror descent and dual averaging methods with function approximation, providing convergence analysis for broad RL settings.
Findings
Algorithms achieve linear or sublinear convergence rates.
Methods demonstrate robustness and competitiveness in preliminary tests.
New theoretical frameworks for policy optimization in complex spaces.
Abstract
Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
