Mastering the Game of Go with Self-play Experience Replay

Jingbin Liu; Xuechun Wang

arXiv:2601.03306·cs.AI·January 8, 2026

Mastering the Game of Go with Self-play Experience Replay

Jingbin Liu, Xuechun Wang

PDF

Open Access

TL;DR

This paper introduces QZero, a model-free reinforcement learning algorithm that learns to play Go at a high level through self-play and experience replay, without relying on search or human data.

Contribution

QZero is the first to demonstrate that model-free RL can master Go, using off-policy learning and a single Q-network, achieving performance comparable to AlphaGo.

Findings

01

QZero matches AlphaGo's performance after 5 months of training.

02

It demonstrates the viability of model-free RL for complex strategy games.

03

Off-policy learning effectively trains large-scale Go agents.

Abstract

The game of Go has long served as a benchmark for artificial intelligence, demanding sophisticated strategic reasoning and long-term planning. Previous approaches such as AlphaGo and its successors, have predominantly relied on model-based Monte-Carlo Tree Search (MCTS). In this work, we present QZero, a novel model-free reinforcement learning algorithm that forgoes search during training and learns a Nash equilibrium policy through self-play and off-policy experience replay. Built upon entropy-regularized Q-learning, QZero utilizes a single Q-value network to unify policy evaluation and improvement. Starting tabula rasa without human data and trained for 5 months with modest compute resources (7 GPUs), QZero achieved a performance level comparable to that of AlphaGo. This demonstrates, for the first time, the efficiency of using model-free reinforcement learning to master the game of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning