Score vs. Winrate in Score-Based Games: which Reward for Reinforcement   Learning?

Luca Pasqualini; Gianluca Amato; Marco Fantozzi; Rosa Gini; Alessandro; Marchetti; Carlo Metta; Francesco Morandin; Maurizio Parton

arXiv:2201.13176·cs.AI·January 10, 2023

Score vs. Winrate in Score-Based Games: which Reward for Reinforcement Learning?

Luca Pasqualini, Gianluca Amato, Marco Fantozzi, Rosa Gini, Alessandro, Marchetti, Carlo Metta, Francesco Morandin, Maurizio Parton

PDF

Open Access

TL;DR

This paper investigates the limitations of training reinforcement learning agents to optimize score differences instead of win/lose outcomes in perfect information games, revealing empirical and theoretical insights into their suboptimality.

Contribution

It provides empirical evidence and a theoretical framework explaining why score-based training may lead to suboptimal policies in deterministic, perfect information games.

Findings

01

Score-based training often results in suboptimal policies.

02

Outcome-optimal policies prefer higher score variance in losing states.

03

Deterministic games can behave like nondeterministic ones under approximation.

Abstract

In the last years, the DeepMind algorithm AlphaZero has become the state of the art to efficiently tackle perfect information two-player zero-sum games with a win/lose outcome. However, when the win/lose outcome is decided by a final score difference, AlphaZero may play score-suboptimal moves because all winning final positions are equivalent from the win/lose outcome perspective. This can be an issue, for instance when used for teaching, or when trying to understand whether there is a better move. Moreover, there is the theoretical quest for the perfect game. A naive approach would be training an AlphaZero-like agent to predict score differences instead of win/lose outcomes. Since the game of Go is deterministic, this should as well produce an outcome-optimal play. However, it is a folklore belief that "this does not work". In this paper, we first provide empirical evidence for this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)

MethodsAlphaZero