SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards

Dengjia Zhang; Xiaoou Liu; Lu Cheng; Yaqing Wang; Kenton Murray; Hua Wei

arXiv:2602.21158·cs.LG·February 26, 2026

SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards

Dengjia Zhang, Xiaoou Liu, Lu Cheng, Yaqing Wang, Kenton Murray, Hua Wei

PDF

Open Access

TL;DR

SELAUR introduces a reinforcement learning framework for LLM agents that leverages intrinsic uncertainty signals to improve exploration, learning stability, and success rates in decision-making tasks.

Contribution

It is the first to incorporate token-level uncertainty metrics into reward design for LLM reinforcement learning, enhancing exploration and robustness.

Findings

01

Improves success rates on ALFWorld and WebShop benchmarks.

02

Uncertainty signals enhance exploration and robustness.

03

Ablation studies confirm the effectiveness of uncertainty integration.

Abstract

Large language models (LLMs) are increasingly deployed as multi-step decision-making agents, where effective reward design is essential for guiding learning. Although recent work explores various forms of reward shaping and step-level credit assignment, a key signal remains largely overlooked: the intrinsic uncertainty of LLMs. Uncertainty reflects model confidence, reveals where exploration is needed, and offers valuable learning cues even in failed trajectories. We introduce SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards, a reinforcement learning framework that incorporates uncertainty directly into the reward design. SELAUR integrates entropy-, least-confidence-, and margin-based metrics into a combined token-level uncertainty estimate, providing dense confidence-aligned supervision, and employs a failure-aware reward reshaping mechanism that injects these uncertainty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Topic Modeling