Structured Evolution with Compact Architectures for Scalable Policy   Optimization

Krzysztof Choromanski; Mark Rowland; Vikas Sindhwani; Richard E.; Turner; Adrian Weller

arXiv:1804.02395·cs.LG·June 13, 2018·52 cites

Structured Evolution with Compact Architectures for Scalable Policy Optimization

Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E., Turner, Adrian Weller

PDF

Open Access

TL;DR

This paper introduces a structured random orthogonal matrix-based gradient estimation method for blackbox optimization, enabling the learning of compact, efficient policies with theoretical guarantees and practical advantages in robotics and high-dimensional spaces.

Contribution

The paper proposes a novel structured gradient estimator that improves accuracy and scalability, facilitating the training of compact policies with theoretical support and practical efficiency.

Findings

01

Achieves better policy quality with fewer parameters.

02

Provides faster training and inference for compact policies.

03

Solves robotics tasks with significantly fewer parameters than previous methods.

Abstract

We present a new method of blackbox optimization via gradient approximation with the use of structured random orthogonal matrices, providing more accurate estimators than baselines and with provable theoretical guarantees. We show that this algorithm can be successfully applied to learn better quality compact policies than those using standard gradient estimation techniques. The compact policies we learn have several advantages over unstructured ones, including faster training algorithms and faster inference. These benefits are important when the policy is deployed on real hardware with limited resources. Further, compact policies provide more scalable architectures for derivative-free optimization (DFO) in high-dimensional spaces. We show that most robotics tasks from the OpenAI Gym can be solved using neural networks with less than 300 parameters, with almost linear time complexity of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Reinforcement Learning in Robotics