MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Zehui Chen; Kuikun Liu; Qiuchen Wang; Jiangning Liu; Wenwei Zhang; Kai Chen; Feng Zhao

arXiv:2407.20183·cs.CL·November 3, 2025·2 cites

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao

PDF

Open Access 1 Repo 3 Reviews

TL;DR

MindSearch mimics human cognitive strategies using a multi-agent framework with LLMs to improve web information seeking and integration, significantly enhancing answer quality and efficiency over existing AI search methods.

Contribution

The paper introduces MindSearch, a novel multi-agent LLM-based framework that models human-like multi-step web information seeking and integration, addressing key challenges in current AI search systems.

Findings

01

Achieves search and integration from over 300 web pages in 3 minutes

02

Outperforms ChatGPT-Web and Perplexity.ai in response quality

03

Delivers responses preferred by humans over existing AI search tools

Abstract

Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests often cannot be accurately and completely retrieved by the search engine once (2) corresponding information to be integrated is spread over multiple web pages along with massive noise, and (3) a large number of web pages with long contents may quickly exceed the maximum context length of LLMs. Inspired by the cognitive process when humans solve these problems, we introduce MindSearch to mimic the human minds in web information seeking and integration, which can be instantiated by a simple yet effective LLM-based multi-agent framework. The…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1) The paper demonstrates considerably better output responses for MindSearch, compared to proprietary AI-Search engines like Perplexity Pro and ChatGPT-Web. 2) MindSearch also works considerably better than the closed-book and ReACT baselines on a variety of multi-hop question-answering datasets. 3) Extensive analysis and evaluation provided in terms of the prompting strategy for WebPlanner along with using a graph-based methodology vs JSON-based and code-based.

Weaknesses

1) While the paper only evaluates for final response quality, it does not consider the attribution quality of the generated response. Popular AI search engines like Perplexity.AI and ChatGPT-web also provide citations as part of the generated output. The authors do not discuss whether MindSearch provides any kind of attribution, and if yes, what does the citation quality look like (based on automatic evaluations like ALCE [1]) 2) No analysis was provided with regard to the dynamic graph constru

Reviewer 02Rating 5Confidence 4

Strengths

1. The problem is both interesting and important. Multi-agent systems for complex QA tasks that are robust and effective 2. Easy to follow and the methods are simple and well explained. 3. Experiments that include inference cost analysis is well considered.

Weaknesses

1) The work fails to cite and compare to other relevant baselines. For complex QA tasks like HotpotQA or MusiqueQA self-ask[1] with search is a relevant baseline. Similarly Searchain[2] is particularly relevant as it also forms a global reasoning chain or graph where the query is decomposed into subquestions that comprise the nodes of the chain and this planning is similar in philosophy to Mindsearch. I think Assistantbench[3] released in July 2024 is also very relevant and useful to evaluate on

Reviewer 03Rating 6Confidence 4

Strengths

S1: The writing and framework of this paper are clear and easy to follow. S2: The method is novel, utilizing the agents WebPlanner and WebSearcher to perform web search tasks. S3: Extensive experiments are conducted, demonstrating both the effectiveness and efficiency of this approach.

Weaknesses

W1: In Figure 5, the words should also be accompanied by English translations. W2: For WebSearcher, how does the LLM select the most valuable pages from all the retrieved web content? More details should be provided. Additionally, regarding answer generation, the statement, "After reading these results, the LLM generates a response to answer the original question based on the search results," requires further elaboration, such as information on input design or specific prompt construction. W3:

Code & Models

Repositories

internlm/mindsearch
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms