The Context Gathering Decision Process: A POMDP Framework for Agentic Search
Chinmaya Kausik, Adith Swaminathan, Nathan Kallus

TL;DR
This paper introduces a POMDP-based framework called the Context Gathering Decision Process (CGDP) for improving agentic search in large environments by refining belief states and halting unproductive searches.
Contribution
It formalizes the CGDP for LLM agents, models behavior as approximate Thompson Sampling, and proposes modular interventions that enhance reasoning and reduce token usage.
Findings
Replacing implicit state with CGDP belief state improves multi-hop reasoning by up to 11.4%.
Programmatic exhaustion gate reduces token usage by up to 39% without performance loss.
Framework guides modular improvements to agentic search in complex environments.
Abstract
Large Language Model (LLM) agents are deployed in complex environments -- such as massive codebases, enterprise databases, and conversational histories -- where the relevant state far exceeds their context windows. To navigate these spaces, an agent must iteratively explore the environment to find relevant information. However, without explicit infrastructure, an agent's working memory can degrade into lossy representations of the search state, resulting in redundant work (e.g. repetitive looping) and premature stopping. In this work, we formalize this challenge as the Context Gathering Decision Process (CGDP), a specialized Partially Observable Markov Decision Process, where an agent's objective is to adaptively refine its belief state to isolate the necessary information for a task. We model an LLM's behavior as approximate Thompson Sampling within this CGDP, and introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
