Leveraging Reward Gradients For Reinforcement Learning in Differentiable Physics Simulations
Sean Gillen, Katie Byl

TL;DR
This paper introduces a novel algorithm that effectively utilizes reward gradients in differentiable physics simulators to improve reinforcement learning performance on complex control tasks.
Contribution
The paper presents the cross entropy analytic policy gradients algorithm, enabling better use of reward gradients in differentiable physics simulations for reinforcement learning.
Findings
Outperforms state-of-the-art deep reinforcement learning algorithms
Successfully applies to challenging nonlinear control problems
Demonstrates the practical utility of reward gradients in physics-based RL
Abstract
In recent years, fully differentiable rigid body physics simulators have been developed, which can be used to simulate a wide range of robotic systems. In the context of reinforcement learning for control, these simulators theoretically allow algorithms to be applied directly to analytic gradients of the reward function. However, to date, these gradients have proved extremely challenging to use, and are outclassed by algorithms using no gradient information at all. In this work we present a novel algorithm, cross entropy analytic policy gradients, that is able to leverage these gradients to outperform state of art deep reinforcement learning on a set of challenging nonlinear control problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Model Reduction and Neural Networks
