TSLM: Tree-Structured Language Modeling for Divergent Thinking

Doyoung Kim; Jaehyeok Doo; Minjoon Seo

arXiv:2601.22688·cs.CL·February 2, 2026

TSLM: Tree-Structured Language Modeling for Divergent Thinking

Doyoung Kim, Jaehyeok Doo, Minjoon Seo

PDF

Open Access 3 Reviews

TL;DR

TSLM introduces a tree-structured approach to language modeling that encodes branching search paths, enabling more efficient and systematic reasoning by internalizing exploration within a single model generation.

Contribution

The paper presents TSLM, a novel tree-structured language model that internalizes systematic exploration, improving inference efficiency and reasoning robustness over traditional sequential models.

Findings

01

TSLM outperforms traditional models in reasoning tasks.

02

It reduces inference time by avoiding multiple forward passes.

03

Demonstrates effective internalization of search strategies.

Abstract

Language models generate reasoning sequentially, preventing them from decoupling irrelevant exploration paths during search. We introduce Tree-Structured Language Modeling (TSLM), which uses special tokens to encode branching structure, enabling models to generate and selectively expand multiple search paths within a single generation process. By training on complete search trees including both successful and failed attempts, TSLM learns to internalize systematic exploration without redundant recomputation of shared prefixes. TSLM achieves robust performance and superior inference efficiency by avoiding the multiple independent forward passes required by external search methods. These results suggest a new paradigm of inference-time scaling for robust reasoning, demonstrating that supervised learning on complete tree-structured traces provides an efficient alternative for developing…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. Tree structured reasoning is reflected into token sequences and only one model call is required. This is different from multiple independent model calls used by Tree-of-Thought. 2. The token-based serialization approach is practical for achieving tree-structured generation using LLMs. 3. The experiments span both structured and open-ended reasoning tasks.

Weaknesses

1. The time cost comparison with methods such as Tree-of-Thought is not presented. 2. As shown in Table 1, the performance of TSLM is not as good as ToT in GSM8K. 3. The token-based tree serialization relies on search trees gotten from other reasoning methods. As such, the performance is also limited by the other reasoning methods. 4. There are some more advanced reasoning methods such as Graph-of-Thought and Everything of Thoughts.

Reviewer 02Rating 4Confidence 2

Strengths

The paper is generally clearly written. The proposed tree-structured language modeling paradigm is interesting.

Weaknesses

The method is helpful on tasks where systematic exploration is needed, but I think more open-ended and real-world tasks are more important. As indicated in Table 1, the proposed method has no benefit in these scenarios. The bolded number in the GSM8k row should be the ToT one, by the way. The experiment detailed configuration for the ToT baseline is not mentioned. Since the proposed method requires much higher inference-time compute, it is unclear how much the advantage will diminish if you let

Reviewer 03Rating 6Confidence 4

Strengths

TSLM aims at deployment simplicity. Eliminating external orchestration and multiple model calls is attractive for production where latency and engineering complexity matter. The authors take care to compare conceptually with ToT and with RL, explaining why a supervised approach can be more stable and economical. The stated effect that TSLM helps smaller models close the gap with larger ones is promising for cost-sensitive settings. The conceptual boundary is easy to communicate. External searc

Weaknesses

Related work coverage is incomplete on diversity-first inference and redundancy reduction with memory, which is a very proximate thread to the authors’ efficiency and breadth claims. The current submission neither cites nor contrasts with these works (e.g., Lingam et al., ICLR 2025), which could mislead readers about the frontier on exploration efficiency. This needs correction. Evidence granularity is thin in the visible draft. The claims about improved extrapolation, small-to-large bridging,

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification