Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces

Ji Gao; Caleb Ju; Guanghui Lan; Zhaohui Tong

arXiv:2603.10199·cs.LG·March 12, 2026

Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces

Ji Gao, Caleb Ju, Guanghui Lan, Zhaohui Tong

PDF

Open Access

TL;DR

This paper introduces actor-accelerated Policy Dual Averaging (PDA), a reinforcement learning method that uses a learned policy network to efficiently solve continuous action space problems while maintaining theoretical convergence guarantees.

Contribution

It proposes a novel actor-accelerated PDA algorithm that approximates optimization steps with a learned policy, enabling practical deployment in continuous action spaces with convergence assurances.

Findings

01

Achieves superior performance over PPO in benchmarks.

02

Maintains convergence guarantees despite approximation errors.

03

Reduces computational complexity of action selection.

Abstract

Policy Dual Averaging (PDA) offers a principled Policy Mirror Descent (PMD) framework that more naturally admits value function approximation than standard PMD, enabling the use of approximate advantage (or Q-) functions while retaining strong convergence guarantees. However, applying PDA in continuous state and action spaces remains computationally challenging, since action selection involves solving an optimization sub-problem at each decision step. In this paper, we propose \textit{actor-accelerated PDA}, which uses a learned policy network to approximate the solution of the optimization sub-problems, yielding faster runtimes while maintaining convergence guarantees. We provide a theoretical analysis that quantifies how actor approximation error impacts the convergence of PDA under suitable assumptions. We then evaluate its performance on several benchmarks in robotics, control, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control