TL;DR
This paper introduces SAI, a Go AI that models winrates across different komi values using a sigmoid function, enhancing self-play training and score estimation on 7x7 Go.
Contribution
It presents a novel multiple-komi approach with sigmoid modeling, improving reinforcement learning efficiency and score estimation in Go AI.
Findings
Achieved strong playing agents on 7x7 Go.
Successfully modeled winrate as a function of komi with a sigmoid.
Enabled score difference estimation and game decisiveness evaluation.
Abstract
We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero paradigm. The winrate as a function of the komi is modeled with a two-parameters sigmoid function, so that the neural network must predict just one more variable to assess the winrate for all komi values. A second novel feature is that training is based on self-play games that occasionally branch -- with changed komi -- when the position is uneven. With this setting, reinforcement learning is showed to work on 7x7 Go, obtaining very strong playing agents. As a useful byproduct, the sigmoid parameters given by the network allow to estimate the score difference on the board, and to evaluate how much the game is decided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
