UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making

Jinhao Duan; James Diffenderfer; Sandeep Madireddy; Tianlong Chen; Bhavya Kailkhura; Kaidi Xu

arXiv:2506.17419·cs.CL·June 24, 2025

UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making

Jinhao Duan, James Diffenderfer, Sandeep Madireddy, Tianlong Chen, Bhavya Kailkhura, Kaidi Xu

PDF

1 Repo

TL;DR

This paper introduces UProp, a novel uncertainty propagation framework for large language models in multi-step decision-making, improving trustworthiness and performance in safety-critical applications.

Contribution

The paper proposes UProp, an information-theoretic extrinsic uncertainty estimator that effectively quantifies uncertainty propagation in multi-step LLM decision processes.

Findings

01

UProp outperforms existing single-turn uncertainty quantification methods.

02

UProp demonstrates effectiveness across multiple benchmarks and state-of-the-art LLMs.

03

Comprehensive analysis shows UProp's sampling efficiency and potential applications.

Abstract

As Large Language Models (LLMs) are integrated into safety-critical applications involving sequential decision-making in the real world, it is essential to know when to trust LLM decisions. Existing LLM Uncertainty Quantification (UQ) methods are primarily designed for single-turn question-answering formats, resulting in multi-step decision-making scenarios, e.g., LLM agentic system, being underexplored. In this paper, we introduce a principled, information-theoretic framework that decomposes LLM sequential decision uncertainty into two parts: (i) internal uncertainty intrinsic to the current decision, which is focused on existing UQ methods, and (ii) extrinsic uncertainty, a Mutual-Information (MI) quantity describing how much uncertainty should be inherited from preceding decisions. We then propose UProp, an efficient and effective extrinsic uncertainty estimator that converts the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinhaoduan/uprop
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGPT-4