Do Differentiable Simulators Give Better Policy Gradients?

H.J. Terry Suh; Max Simchowitz; Kaiqing Zhang; Russ Tedrake

arXiv:2202.00817·cs.LG·August 23, 2022·5 cites

Do Differentiable Simulators Give Better Policy Gradients?

H.J. Terry Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake

PDF

Open Access 1 Repo

TL;DR

This paper investigates the effectiveness of differentiable simulators for policy gradient estimation in reinforcement learning, highlighting their limitations in complex physical systems and proposing a new hybrid gradient estimator.

Contribution

It introduces an $oldsymbol{oldsymbol{ ext{ extalpha}}}$-order gradient estimator that combines the strengths of first- and zero-order methods, improving robustness and efficiency.

Findings

01

First-order estimators can be biased or high-variance in complex landscapes.

02

The $ ext{ extalpha}$-order estimator balances efficiency and robustness.

03

Numerical examples show the advantages of the $ ext{ extalpha}$-order estimator.

Abstract

Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $α$ -order gradient estimator, with $α \in [0, 1]$ , which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

michael-cummins/DeePC-HUNT
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Simulation Techniques and Applications · Probabilistic and Robust Engineering Design