
TL;DR
This paper explores enhancing computer Go performance by replacing residual networks with mobile networks and generalizing the PUCT search algorithm, demonstrating improvements in network efficiency and search effectiveness.
Contribution
It introduces mobile networks as an alternative to residual networks and proposes a generalized PUCT algorithm, advancing model efficiency and search strategy in computer Go.
Findings
Mobile networks can match or outperform residual networks in Go playing strength.
The generalized PUCT algorithm improves search efficiency over the original PUCT.
Experimental results show benefits across different network widths and depths.
Abstract
The standard for Deep Reinforcement Learning in games, following Alpha Zero, is to use residual networks and to increase the depth of the network to get better results. We propose to improve mobile networks as an alternative to residual networks and experimentally show the playing strength of the networks according to both their width and their depth. We also propose a generalization of the PUCT search algorithm that improves on PUCT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
