Improving Model and Search for Computer Go

Tristan Cazenave

arXiv:2102.03467·cs.AI·April 12, 2021

Improving Model and Search for Computer Go

Tristan Cazenave

PDF

TL;DR

This paper explores enhancing computer Go performance by replacing residual networks with mobile networks and generalizing the PUCT search algorithm, demonstrating improvements in network efficiency and search effectiveness.

Contribution

It introduces mobile networks as an alternative to residual networks and proposes a generalized PUCT algorithm, advancing model efficiency and search strategy in computer Go.

Findings

01

Mobile networks can match or outperform residual networks in Go playing strength.

02

The generalized PUCT algorithm improves search efficiency over the original PUCT.

03

Experimental results show benefits across different network widths and depths.

Abstract

The standard for Deep Reinforcement Learning in games, following Alpha Zero, is to use residual networks and to increase the depth of the network to get better results. We propose to improve mobile networks as an alternative to residual networks and experimentally show the playing strength of the networks according to both their width and their depth. We also propose a generalization of the PUCT search algorithm that improves on PUCT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.