HyperPPO: A scalable method for finding small policies for robotic control
Shashank Hegde, Zhehui Huang, Gaurav S. Sukhatme

TL;DR
HyperPPO is a scalable reinforcement learning method that efficiently finds small, high-performing neural network policies for robotic control by estimating multiple architectures simultaneously using graph hypernetworks.
Contribution
It introduces HyperPPO, an on-policy RL algorithm that leverages graph hypernetworks to concurrently estimate weights for multiple neural architectures, enabling efficient discovery of small, performant policies.
Findings
HyperPPO scales well with more training resources.
It produces small neural policies suitable for resource-constrained robots.
Policies learned can control a Crazyflie2.1 quadrotor effectively.
Abstract
Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Adversarial Robustness in Machine Learning
