TL;DR
This paper introduces an entropy-guided adaptive reasoning framework for LLMs applied to game-theoretic tasks like Tic-Tac-Toe, significantly improving decision quality by dynamically adjusting reasoning complexity based on uncertainty.
Contribution
It presents a novel entropy-aware in-context learning approach that adaptively modifies reasoning paths and context retrieval, enhancing LLM performance in sequential decision tasks.
Findings
Entropy-guided adaptive reasoning improves game outcomes from -11.6% to +9.5%.
The method maintains low query counts while increasing decision accuracy.
A negative correlation exists between token entropy and move optimality.
Abstract
We propose a novel LLM-based framework for reasoning in discrete, game-theoretic tasks, illustrated with \emph{Tic-Tac-Toe}. The method integrates in-context learning with entropy-guided chain-of-thought (CoT) reasoning and adaptive context retrieval. The model dynamically adjusts both the number of retrieved examples and reasoning paths according to token-level uncertainty: concise reasoning with minimal context is used when uncertainty is low, whereas higher uncertainty triggers expanded multi-path CoT exploration. Experimental evaluation against a sub-optimal algorithmic opponent shows that entropy-aware adaptive reasoning substantially improves decision quality, increasing the average game outcome from \(-11.6\%\) with the baseline LLM to \(+9.5\%\) with entropy-guided adaptive reasoning over 100 games (win = +1, tie = 0, loss = -1), while maintaining a relatively low number of LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
