Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations
Harshita Chopra, Chirag Shah

TL;DR
This paper presents a feedback-aware Monte Carlo Tree Search framework that uses LLMs for strategic question generation, significantly improving success rates and efficiency in goal-oriented conversations like medical diagnosis and troubleshooting.
Contribution
It introduces a hierarchical feedback mechanism with cluster-based rewards guiding question selection, reducing LLM calls and enhancing decision-making in conversational systems.
Findings
12% improvement in success rates
10x reduction in LLM calls for planning
8% additional success gain with constrained options
Abstract
Effective decision-making and problem-solving in conversational systems require the ability to identify and acquire missing information through targeted questioning. A key challenge lies in efficiently narrowing down a large space of possible outcomes by posing questions that minimize uncertainty. To address this, we introduce a novel framework that leverages Large Language Models (LLMs) to generate information-seeking questions, with Monte Carlo Tree Search (MCTS) to strategically select questions that maximize information gain, as a part of inference-time planning. Our primary contribution includes a hierarchical feedback mechanism that exploits past interaction patterns to guide future strategy. Specifically, each new problem is mapped to a cluster based on semantic similarity, and our UCT (Upper Confidence bound for Trees) formulation employs a cluster-specific bonus reward to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Advanced Text Analysis Techniques · Cognitive Science and Education Research
MethodsSparse Evolutionary Training
