Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces
Ji Gao, Caleb Ju, Guanghui Lan, Zhaohui Tong

TL;DR
This paper introduces actor-accelerated Policy Dual Averaging (PDA), a reinforcement learning method that uses a learned policy network to efficiently solve continuous action space problems while maintaining theoretical convergence guarantees.
Contribution
It proposes a novel actor-accelerated PDA algorithm that approximates optimization steps with a learned policy, enabling practical deployment in continuous action spaces with convergence assurances.
Findings
Achieves superior performance over PPO in benchmarks.
Maintains convergence guarantees despite approximation errors.
Reduces computational complexity of action selection.
Abstract
Policy Dual Averaging (PDA) offers a principled Policy Mirror Descent (PMD) framework that more naturally admits value function approximation than standard PMD, enabling the use of approximate advantage (or Q-) functions while retaining strong convergence guarantees. However, applying PDA in continuous state and action spaces remains computationally challenging, since action selection involves solving an optimization sub-problem at each decision step. In this paper, we propose \textit{actor-accelerated PDA}, which uses a learned policy network to approximate the solution of the optimization sub-problems, yielding faster runtimes while maintaining convergence guarantees. We provide a theoretical analysis that quantifies how actor approximation error impacts the convergence of PDA under suitable assumptions. We then evaluate its performance on several benchmarks in robotics, control, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
