Automatic Generation of High-Performance RL Environments
Seth Karten, Rahul Dev Appapogu, Chi Jin

TL;DR
This paper introduces a closed-loop methodology for automatically generating high-performance reinforcement learning environments with minimal compute, verified for equivalence and efficiency across multiple workflows.
Contribution
The authors present a novel, generic approach combining prompt templates, hierarchical verification, and cross-backend transfer to automate high-performance RL environment creation.
Findings
Achieved environment overhead below 4% of training time at 200M parameters.
Verified equivalence across five diverse environments.
Created TCGJax, the first Pokemon TCG Pocket environment from web specifications.
Abstract
Translating complex reinforcement learning (RL) environments into high-performance implementations has traditionally required months of specialized engineering. We present a closed-loop methodology that produces equivalent high-performance environments for minimal compute cost. Our method uses a generic prompt template, hierarchical verification (property, interaction, and rollout tests), iterative repair, and cross-backend policy transfer to verify no sim-to-sim gap. We demonstrate three distinct workflows across five environments: (1) Direct translation (no prior performance implementation exists) from Game Boy emulator PyBoy to our EmuRust (via Rust IPC) and from Pokemon Showdown to our PokeJAX (via JAX); (2) Translation verified against existing performance implementations via throughput parity with Puffer Pong, MJX and Brax at matched GPU batch sizes; and (3) New environment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · AI-based Problem Solving and Planning
