ReDAct: Uncertainty-Aware Deferral for LLM Agents

Dzianis Piatrashyn; Nikita Kotelevskii; Kirill Grishchenkov; Nikita Glazkov; Ivan Nasonov; Ilya Makarov; Timothy Baldwin; Preslav Nakov; Roman Vashurin; Maxim Panov

arXiv:2604.07036·cs.CL·April 9, 2026

ReDAct: Uncertainty-Aware Deferral for LLM Agents

Dzianis Piatrashyn, Nikita Kotelevskii, Kirill Grishchenkov, Nikita Glazkov, Ivan Nasonov, Ilya Makarov, Timothy Baldwin, Preslav Nakov, Roman Vashurin, Maxim Panov

PDF

TL;DR

ReDAct introduces an uncertainty-aware deferral mechanism for LLM agents, balancing cost and accuracy by selectively deferring decisions to a larger, more reliable model in sequential decision tasks.

Contribution

The paper proposes ReDAct, a novel approach that uses a small and a large LLM with uncertainty thresholds to reduce costs while maintaining decision quality.

Findings

01

Deferring about 15% of decisions to the large model matches its exclusive performance.

02

ReDAct significantly reduces inference costs compared to using only the large model.

03

The approach is effective in text-based embodied environments like ALFWorld and MiniGrid.

Abstract

Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems. However, they inherit the tendency of LLMs to hallucinate, leading to incorrect decisions. In sequential settings, even a single mistake can irreversibly degrade the trajectory, making hallucinations an even bigger problem. Although larger LLMs hallucinate less, they incur a significantly higher per-token cost. In this paper, we address this tradeoff by proposing ReDAct (Reason-Defer-Act). In ReDAct, an agent is equipped with two LLMs: a small, cheap model used by default, and a large, more reliable but expensive model. When the predictive uncertainty of the small model exceeds a calibrated threshold, the decision is deferred to the large model. We evaluate our approach in text-based embodied environments such as ALFWorld and MiniGrid and show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.