RANDPOL: Parameter-Efficient End-to-End Quadruped Locomotion via Randomized Policy Learning
Zhuochen Liu, Rahul Jain, Quan Nguyen

TL;DR
RANDPOL introduces a parameter-efficient end-to-end quadruped locomotion controller by fixing hidden layers and only training a linear readout, achieving competitive performance with fewer trainable parameters.
Contribution
The paper proposes RANDPOL, a novel randomized policy learning method that significantly reduces trainable parameters while maintaining effective quadruped locomotion control.
Findings
RANDPOL achieves comparable locomotion performance to PPO with fewer parameters.
RANDPOL enables faster learning iterations due to reduced optimization complexity.
Successful zero-shot sim-to-real transfer on physical quadruped demonstrates practical effectiveness.
Abstract
Modern learning-based locomotion controllers typically rely on fully trainable deep neural networks with a large number of parameters. This paper studies a different design point for end-to-end control: whether effective quadruped locomotion can be achieved with a drastically reduced trainable parameter space. We present RANDomized POlicy Learning (RANDPOL), a policy learning approach in which the hidden layers of the actor and critic are randomly initialized and fixed, while only the final linear readout is trained. This yields a parameter-efficient controller class that retains nonlinear expressiveness through a fixed random basis while substantially reducing the dimension of the optimization problem. RANDPOL is supported by the mathematical foundation of randomized function approximation, which provides a principled basis for using fixed random nonlinear features as expressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
