TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search
Jonathan Baxter, Andrew Tridgell, and Lex Weaver

TL;DR
This paper introduces TDLeaf(lambda), a novel method combining temporal difference learning with game-tree search, demonstrating significant performance improvements in chess and backgammon through experimental results.
Contribution
The paper presents TDLeaf(lambda), a new algorithm integrating TD(lambda) with minimax search, enabling effective learning of evaluation functions in game-playing AI.
Findings
Chess rating improved from 1650 to 2100 after 308 games
Demonstrated utility of TDLeaf(lambda) in chess and backgammon
Compared with TD(lambda) and TD-directed(lambda)
Abstract
In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD(lambda) and another less radical variant, TD-directed(lambda). In particular, our chess program, ``KnightCap,'' used TDLeaf(lambda) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). It improved from a 1650 rating to a 2100 rating in just 308 games. We discuss some of the reasons for this success and the relationship between our results and Tesauro's results in backgammon.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Reinforcement Learning in Robotics
