Learning Game-Playing Agents with Generative Code Optimization

Zhiyi Kuang; Ryan Rong; YuCheng Yuan; Allen Nie

arXiv:2508.19506·cs.LG·August 28, 2025

Learning Game-Playing Agents with Generative Code Optimization

Zhiyi Kuang, Ryan Rong, YuCheng Yuan, Allen Nie

PDF

TL;DR

This paper introduces a novel approach for creating game-playing agents by evolving Python code with large language models, achieving competitive performance with less training and interaction.

Contribution

It presents a generative optimization method that treats policies as self-evolving code, enabling efficient learning and adaptation in game environments.

Findings

01

Achieves performance comparable to deep RL baselines on Atari games.

02

Uses significantly less training time and environment interactions.

03

Demonstrates the potential of programmatic policies for complex reasoning.

Abstract

We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.