Efficiently Learning Small Policies for Locomotion and Manipulation
Shashank Hegde, Gaurav S. Sukhatme

TL;DR
This paper introduces graph hyper networks for off-policy reinforcement learning, enabling the creation of small, high-performance policies for robots that are adaptable to memory constraints and comparable to larger models.
Contribution
The authors present a novel graph hyper network approach that produces significantly smaller policies without sacrificing performance, applicable across various robotic tasks.
Findings
Networks are two orders of magnitude smaller than traditional models.
Multiple policies with different sizes can be trained efficiently.
The method is compatible with any off-policy RL algorithm.
Abstract
Neural control of memory-constrained, agile robots requires small, yet highly performant models. We leverage graph hyper networks to learn graph hyper policies trained with off-policy reinforcement learning resulting in networks that are two orders of magnitude smaller than commonly used networks yet encode policies comparable to those encoded by much larger networks trained on the same task. We show that our method can be appended to any off-policy reinforcement learning algorithm, without any change in hyperparameters, by showing results across locomotion and manipulation tasks. Further, we obtain an array of working policies, with differing numbers of parameters, allowing us to pick an optimal network for the memory constraints of a system. Training multiple policies with our method is as sample efficient as training a single policy. Finally, we provide a method to select the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
