Zeroth-order Deterministic Policy Gradient

Harshat Kumar; Dionysios S. Kalogerias; George J. Pappas and; Alejandro Ribeiro

arXiv:2006.07314·cs.LG·July 14, 2020·5 cites

Zeroth-order Deterministic Policy Gradient

Harshat Kumar, Dionysios S. Kalogerias, George J. Pappas and, Alejandro Ribeiro

PDF

Open Access

TL;DR

ZDPG introduces a critic-free, model-free deterministic policy gradient method using two-point stochastic evaluations, achieving improved stability and sample complexity in reinforcement learning tasks.

Contribution

It proposes ZDPG, a novel critic-free approach that approximates policy gradients with stochastic evaluations, enhancing stability and efficiency over existing methods.

Findings

01

ZDPG is effective in practical reinforcement learning scenarios.

02

It offers improved finite sample complexity bounds.

03

ZDPG outperforms traditional PG and baseline methods in experiments.

Abstract

Deterministic Policy Gradient (DPG) removes a level of randomness from standard randomized-action Policy Gradient (PG), and demonstrates substantial empirical success for tackling complex dynamic problems involving Markov decision processes. At the same time, though, DPG loses its ability to learn in a model-free (i.e., actor-only) fashion, frequently necessitating the use of critics in order to obtain consistent estimates of the associated policy-reward gradient. In this work, we introduce Zeroth-order Deterministic Policy Gradient (ZDPG), which approximates policy-reward gradients via two-point stochastic evaluations of the $Q$ -function, constructed by properly designed low-dimensional action-space perturbations. Exploiting the idea of random horizon rollouts for obtaining unbiased estimates of the $Q$ -function, ZDPG lifts the dependence on critics and restores true model-free policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques

MethodsDeterministic Policy Gradient