AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery

Nilesh Jain; Rohit Yadav; Sagar Kotian; Claude AI

arXiv:2603.07300·cs.LG·March 20, 2026

AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery

Nilesh Jain, Rohit Yadav, Sagar Kotian, Claude AI

PDF

Open Access

TL;DR

AutoResearch-RL is a self-supervised reinforcement learning framework that autonomously discovers neural architectures and hyperparameters through perpetual, reward-driven exploration, demonstrating competitive results without human intervention.

Contribution

It introduces a novel autonomous RL-based framework for neural architecture search that operates continuously without human supervision, formalizes it as a Markov Decision Process, and provides convergence guarantees.

Findings

01

Achieves or surpasses hand-tuned baselines on a nanochat pretraining benchmark.

02

Operates effectively with approximately 300 iterations on a single GPU.

03

Demonstrates the feasibility of perpetual, self-evolving neural architecture discovery.

Abstract

We present AutoResearch-RL, a framework in which a reinforcement learning agent conducts open-ended neural architecture and hyperparameter research without human supervision, running perpetually until a termination oracle signals convergence or resource exhaustion. At each step the agent proposes a code modification to a target training script, executes it under a fixed wall clock time budget, observes a scalar reward derived from validation bits-per-byte (val-bpb), and updates its policy via Proximal Policy Optimisation (PPO). The key design insight is the separation of three concerns: (i) a frozen environment (data pipeline, evaluation protocol, and constants) that guarantees fair cross-experiment comparison; (ii) a mutable target file (train.py) that represents the agent's editable state; and (iii) a meta-learner (the RL agent itself) that accumulates a growing trajectory of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning