Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

Nimesh Khandelwal; Shakti S. Gupta

arXiv:2603.27416·cs.RO·March 31, 2026

Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

Nimesh Khandelwal, Shakti S. Gupta

PDF

TL;DR

This paper presents a case study of an agent-driven autonomous reinforcement learning system for quadruped locomotion, demonstrating iterative policy improvement with limited human intervention across numerous experiments.

Contribution

It empirically demonstrates that an agent can autonomously execute the iterative RL research process in robotics, highlighting specific autonomous decisions and engineering adaptations.

Findings

01

Agent progressed from rough terrain to optimized policies over 70 experiments.

02

Autonomous system made key research decisions, such as isolating deadlocks and porting reward terms.

03

Achieved reproducible results with minimal human intervention.

Abstract

This paper documents a case study in agent-driven autonomous reinforcement learning research for quadruped locomotion. The setting was not a fully self-starting research system. A human provided high-level directives through an agentic coding environment, while an agent carried out most of the execution loop: reading code, diagnosing failures, editing reward and terrain configurations, launching and monitoring jobs, analyzing intermediate metrics, and proposing the next wave of experiments. Across more than 70 experiments organized into fourteen waves on a DHAV1 12-DoF quadruped in Isaac Lab, the agent progressed from early rough-terrain runs with mean reward around 7 to a best logged Wave 12 run, exp063, with velocity error 0.263 and 97\% timeout over 2000 iterations, independently reproduced five times across different GPUs. The archive also records several concrete autonomous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.