Large Language Model Guided Tree-of-Thought
Jieyi Long

TL;DR
This paper introduces the Tree-of-Thought framework, enhancing large language models' problem-solving by mimicking human trial-and-error reasoning through a tree-like exploration process, significantly improving Sudoku puzzle solving success rates.
Contribution
The paper presents a novel Tree-of-Thought framework that augments LLMs with modules for backtracking and multi-round reasoning, inspired by human problem-solving strategies.
Findings
Significantly increased Sudoku solving success rate
Effective backtracking and exploration in reasoning process
Open-source implementation available on GitHub
Abstract
In this paper, we introduce the Tree-of-Thought (ToT) framework, a novel approach aimed at improving the problem-solving capabilities of auto-regressive large language models (LLMs). The ToT technique is inspired by the human mind's approach for solving complex reasoning tasks through trial and error. In this process, the human mind explores the solution space through a tree-like thought process, allowing for backtracking when necessary. To implement ToT as a software system, we augment an LLM with additional modules including a prompter agent, a checker module, a memory module, and a ToT controller. In order to solve a given problem, these modules engage in a multi-round conversation with the LLM. The memory module records the conversation and state history of the problem solving process, which allows the system to backtrack to the previous steps of the thought-process and explore…
Peer Reviews
Decision·Submitted to NeurIPS 2023
1. This paper presents an interesting tree of thought method to enable back-tracking in auto-regressive language models. This solves one of the key limitations of LLMs. Similar to humans and inspired by system 2 reasoning, the proposed ToT structure, especially how we can employ a checker to dynamically modify and utilize memory, makes a great contribution to the field, and can inspire future work on how LLMs can be prompted, and even pre-trained.
1. Although the high-level idea of tree-of-thought is promising, with corresponding ToT controller, agent, and memory, the paper is only evaluated on one Sudoku task, especially when the details of evaluation (e.g., number of games evaluated, and computational cost and prompts used compared to the baselines) are not specified. This makes the evaluation results less convincing. Moreover, despite that the method sounds generalizable, there is no strong evidence on how each module in the framework
1. The motivation for moving from linear reasoning, like Chain-of-thought, to a tree-like searching/reasoning is strong and well recognized. Considering the fundamental limitation of autoregressive generation of GPT-like LLMs, we do need more advanced reasoning/search algorithms for better decoding. 2. The proposed method is reasonable and technically sound. The checker module echos the recent findings of self-evaluation of LLMs, and the memory module also is useful in agent-based modeling. 3.
1. One of the biggest issues of this paper is the mismatch between the described method and the actual one used in the experiments. The paper spends lots of space talking about how the ToT controller and prompter agent can be modeled by a policy network and trained via multi-agent RL. But it never tried such formulation and training in the experiments and only presented them as some kind of future work. Without valid evidence, empirically or theoretically, the method section is largely questiona
The approach is novel. It mixes general LLM with two neural networks, trained together. The introduction Section is good: it identifies two main limitations for using LLMs in complex problem-solving. Sudoku is a complex task that requires backtracking and a search, which makes it interesting in the context of ToT. The training of policies is an interesting idea that could be of use in other LLM-based algorithms.
The biggest weakness of this paper is the small number of experiments, which also are conducted on a single task (sudoku). In the text, many different versions of ToT are discussed, however, experiments are done only for a single setup and a single task. ToT was not tested on any other task, thus we cannot know if it really generalizes at all. There are far too few experimental results and data. What is missing: - How many nodes ToT needs on average to solve a given task? - How many steps basel
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsgraph theory and CDMA systems
