FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS
Chaeeun Kim, Seungone Kim

TL;DR
FREESON introduces a novel retriever-free reasoning framework using corpus-traversing MCTS, enabling large reasoning models to independently locate relevant knowledge within the corpus, improving efficiency and accuracy in multi-step question answering.
Contribution
This work presents a new retrieval-augmented reasoning approach that eliminates the need for separate retrievers by integrating corpus traversal into the reasoning process with MCTS, enhancing performance and reducing costs.
Findings
Achieves 14.4% average improvement in EM and F1 over models with separate retrievers.
Performs comparably or better than strong baselines on five open-domain QA benchmarks.
Effectively handles both single-hop and multi-hop questions with improved accuracy.
Abstract
Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in multi-step reasoning and calling search engines at appropriate steps. However, existing retrieval-augmented reasoning approaches rely on separate retrieval models, limiting the LRM's role in retrieval to deciding when to retrieve and how to query. This separation not only increases hardware and operational costs but also leads to errors in the retrieval process due to the representation bottleneck, a phenomenon where the retriever's embedding space is not expressive enough to meet the generator's requirements. To address this, we shift our perspective from sequence-to-sequence matching to locating the answer-containing paths within the corpus, and propose a novel framework called FREESON (Retriever-FREE Retrieval-Augmented ReaSONing). This framework enables LRMs to retrieve relevant knowledge on their own by…
Peer Reviews
Decision·Submitted to ICLR 2026
1. It uses one system instead of a separate retriever, so there’s less to train, tune, and break. 2. The prefix index keeps the model grounded in text that actually exists in the corpus. 3. The CT-MCTS search can explore and recover from early mistakes better than greedy/beam.
1. The compared baselines are a bit outdated that many newer works such as WebWalker, WebThinker, ASearcher, MiroThinker etc. are not compared with. 2. The paper is written in a way that is hard-to-understand the concrete implementation. Details of the method design, e.g., how the index is constructed and choosing the particular objective needs further expansion to facilitate understanding. 3. The search latency is in the **25-65 seconds** range, which is too slow for real applications. The auth
- This work is presenting an interesting idea of indexing documents in a tree structure, which allows a more compact representation of documents by encoding multiple tokens as a single node. The search is carried out over the tree structure allowing multiple nodes in a single step for faster inference with fine-grained control by a value network trained separately. - Experimental results show competitive results against several baselines of RAG combined with reasoning abilities, e.g., Search-R1.
- Writing should be improved. There exist many terminologies, in particular, acronyms, not defined clearly, and thus, they could be interpreted arbitrary leading to misunderstanding. Examples are: ANN in line 119 and LRM in line 154 (which is only defined in abstract, but not in the main text). Also, $\mathcal{R}$ is not defined in line 171.
1. Conceptually interesting idea: Reformulating retrieval as a constrained search directly over the corpus is novel and theoretically appealing. It removes the embedding bottleneck and the need for a separate retriever. 2. Introduction of the CorpusTree abstraction: The paper creatively builds on the FM-Index to define a CorpusTree, which represents the corpus as a prefix-constrained traversal space. This abstraction is elegant and enables efficient legality checking of generated prefixes during
1. Limited novelty over existing MCTS-based reasoning frameworks: While the idea of corpus traversal is positioned as new, the core algorithm still heavily relies on standard MCTS machinery. The adaptation to prefix-constrained decoding is incremental, and the novelty is primarily in system integration rather than algorithmic advancement. 2. Severe inference-time inefficiency: The CT-MCTS search requires multiple expansions (M=2) and 32 simulations per query, resulting in about 1.88×10^13 FLOPs
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Multimodal Machine Learning Applications
