TL;DR
Agent-BRACE introduces a structured belief state representation for LLM agents in long-horizon, partially observable tasks, improving decision-making by explicitly modeling uncertainty with natural language claims.
Contribution
It decouples belief modeling from policy learning, using a structured approximation of environment uncertainty, optimized via reinforcement learning.
Findings
Achieves +14.5% and +5.3% performance improvements over RL baselines.
Maintains near-constant context window regardless of episode length.
Belief calibration improves as evidence accumulates during episodes.
Abstract
Large language models (LLMs) are increasingly deployed on long-horizon tasks in partially observable environments, where they must act while inferring and tracking a complex environment state over many steps. This leads to two challenges: partial observability requires maintaining uncertainty over unobserved world attributes, and long interaction history causes context to grow without bound, diluting task-relevant information. A principled solution to both challenges is a belief state: a posterior distribution over environment states given past observations and actions, which compactly encodes history for decision making regardless of episode length. In LLM agents, however, the open-ended nature of text makes it unclear how to represent such a distribution. Therefore, we introduce Agent-BRACE: Agent Belief state Representation via Abstraction and Confidence Estimation, a method that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
