Fourier Policy Gradients

Matthew Fellows; Kamil Ciosek; Shimon Whiteson

arXiv:1802.06891·cs.LG·May 31, 2018·1 cites

Fourier Policy Gradients

Matthew Fellows, Kamil Ciosek, Shimon Whiteson

PDF

Open Access

TL;DR

This paper introduces a Fourier analysis-based method for deriving policy gradient updates in reinforcement learning, enabling low-variance estimates and broad applicability to various policy and critic function families.

Contribution

It presents a novel Fourier analysis approach to derive analytical policy gradient solutions and unifies sample-based estimators, expanding the scope of policy gradient methods.

Findings

01

Analytical solutions for policy gradients with low variance.

02

Applicability to diverse critic functions like trigonometric and radial basis functions.

03

Unified framework for sample-based stochastic policy gradient estimators.

Abstract

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Fuel Cells and Related Materials