Optimistic Temporal Difference Learning for 2048

Hung Guei; Lung-Pin Chen; and I-Chen Wu

arXiv:2111.11090·cs.AI·November 23, 2021

Optimistic Temporal Difference Learning for 2048

Hung Guei, Lung-Pin Chen, and I-Chen Wu

PDF

1 Repo 2 Models

TL;DR

This paper introduces an optimistic initialization approach to enhance exploration in TD learning for 2048, significantly improving performance and reducing network size needed for high scores.

Contribution

It proposes using optimistic initialization in TD and TC learning for 2048, leading to better exploration, higher scores, and smaller networks compared to prior methods.

Findings

01

Significant performance improvement with optimistic initialization

02

Reduced network size for achieving high scores

03

State-of-the-art results with combined techniques

Abstract

Temporal difference (TD) learning and its variants, such as multistage TD (MS-TD) learning and temporal coherence (TC) learning, have been successfully applied to 2048. These methods rely on the stochasticity of the environment of 2048 for exploration. In this paper, we propose to employ optimistic initialization (OI) to encourage exploration for 2048, and empirically show that the learning quality is significantly improved. This approach optimistically initializes the feature weights to very large values. Since weights tend to be reduced once the states are visited, agents tend to explore those states which are unvisited or visited few times. Our experiments show that both TD and TC learning with OI significantly improve the performance. As a result, the network size required to achieve the same performance is significantly reduced. With additional tunings such as expectimax search,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

moporgic/tdl2048
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.