TL;DR
This paper introduces ClipUp, a simple yet effective optimizer for distribution-based policy evolution in reinforcement learning, offering advantages over Adam in hyperparameter tuning and robustness.
Contribution
The paper proposes ClipUp, a momentum-based optimizer with gradient normalization and update clipping, tailored for distribution-based policy evolution, simplifying hyperparameter tuning and improving robustness.
Findings
ClipUp performs competitively with Adam in reinforcement learning tasks.
It simplifies hyperparameter tuning and adapts well to reward scale changes.
Effective on challenging continuous control benchmarks, including Humanoid.
Abstract
Distribution-based search algorithms are an effective approach for evolutionary reinforcement learning of neural network controllers. In these algorithms, gradients of the total reward with respect to the policy parameters are estimated using a population of solutions drawn from a search distribution, and then used for policy optimization with stochastic gradient ascent. A common choice in the community is to use the Adam optimization algorithm for obtaining an adaptive behavior during gradient ascent, due to its success in a variety of supervised learning settings. As an alternative to Adam, we propose to enhance classical momentum-based gradient ascent with two simple techniques: gradient normalization and update clipping. We argue that the resulting optimizer called ClipUp (short for "clipped updates") is a better choice for distribution-based policy evolution because its working…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAdam
