Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion
Nimesh Khandelwal, Shakti S. Gupta

TL;DR
This paper presents a case study of an agent-driven autonomous reinforcement learning system for quadruped locomotion, demonstrating iterative policy improvement with limited human intervention across numerous experiments.
Contribution
It empirically demonstrates that an agent can autonomously execute the iterative RL research process in robotics, highlighting specific autonomous decisions and engineering adaptations.
Findings
Agent progressed from rough terrain to optimized policies over 70 experiments.
Autonomous system made key research decisions, such as isolating deadlocks and porting reward terms.
Achieved reproducible results with minimal human intervention.
Abstract
This paper documents a case study in agent-driven autonomous reinforcement learning research for quadruped locomotion. The setting was not a fully self-starting research system. A human provided high-level directives through an agentic coding environment, while an agent carried out most of the execution loop: reading code, diagnosing failures, editing reward and terrain configurations, launching and monitoring jobs, analyzing intermediate metrics, and proposing the next wave of experiments. Across more than 70 experiments organized into fourteen waves on a DHAV1 12-DoF quadruped in Isaac Lab, the agent progressed from early rough-terrain runs with mean reward around 7 to a best logged Wave 12 run, exp063, with velocity error 0.263 and 97\% timeout over 2000 iterations, independently reproduced five times across different GPUs. The archive also records several concrete autonomous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
