Learning Game-Playing Agents with Generative Code Optimization
Zhiyi Kuang, Ryan Rong, YuCheng Yuan, Allen Nie

TL;DR
This paper introduces a novel approach for creating game-playing agents by evolving Python code with large language models, achieving competitive performance with less training and interaction.
Contribution
It presents a generative optimization method that treats policies as self-evolving code, enabling efficient learning and adaptation in game environments.
Findings
Achieves performance comparable to deep RL baselines on Atari games.
Uses significantly less training time and environment interactions.
Demonstrates the potential of programmatic policies for complex reasoning.
Abstract
We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
