Leveraging Reward Gradients For Reinforcement Learning in Differentiable   Physics Simulations

Sean Gillen; Katie Byl

arXiv:2203.02857·cs.LG·March 8, 2022·1 cites

Leveraging Reward Gradients For Reinforcement Learning in Differentiable Physics Simulations

Sean Gillen, Katie Byl

PDF

Open Access

TL;DR

This paper introduces a novel algorithm that effectively utilizes reward gradients in differentiable physics simulators to improve reinforcement learning performance on complex control tasks.

Contribution

The paper presents the cross entropy analytic policy gradients algorithm, enabling better use of reward gradients in differentiable physics simulations for reinforcement learning.

Findings

01

Outperforms state-of-the-art deep reinforcement learning algorithms

02

Successfully applies to challenging nonlinear control problems

03

Demonstrates the practical utility of reward gradients in physics-based RL

Abstract

In recent years, fully differentiable rigid body physics simulators have been developed, which can be used to simulate a wide range of robotic systems. In the context of reinforcement learning for control, these simulators theoretically allow algorithms to be applied directly to analytic gradients of the reward function. However, to date, these gradients have proved extremely challenging to use, and are outclassed by algorithms using no gradient information at all. In this work we present a novel algorithm, cross entropy analytic policy gradients, that is able to leverage these gradients to outperform state of art deep reinforcement learning on a set of challenging nonlinear control problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Model Reduction and Neural Networks