OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

Sheldon Yu; Junda Wu; Xintong Li; Nikki Lijing Kuang; Sizhe Zhou; Tong Yu; Jiawei Han; Jingbo Shang; Julian McAuley

arXiv:2605.11169·cs.AI·May 13, 2026

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

Sheldon Yu, Junda Wu, Xintong Li, Nikki Lijing Kuang, Sizhe Zhou, Tong Yu, Jiawei Han, Jingbo Shang, Julian McAuley

PDF

TL;DR

OLIVIA introduces an inference-time adaptation framework for ReAct-style LLM agents, modeling action selection as a contextual bandit to improve decision-making efficiency and reliability during deployment.

Contribution

OLIVIA is the first to explicitly model the action selection layer as a bandit, enabling online updates and uncertainty estimation for better deployment performance.

Findings

01

OLIVIA improves task performance across four benchmarks.

02

It provides explicit uncertainty estimates for actions.

03

OLIVIA achieves these improvements with minimal computational overhead.

Abstract

Large language model agents interleave reasoning, action selection, and observation to solve sequential decision-making tasks. In deployed settings where agents repeatedly handle related multi-step tasks, small action-selection errors can accumulate into wasted tool calls, latency, and reduced reliability. Despite this need for deployment-time improvement, existing inference-time adaptation methods for LLM agents mainly rely on prompting or retrieval, which influence behavior indirectly through context manipulation. For ReAct-style agents, such approaches do not expose an explicit decision layer that can score candidate actions, represent uncertainty, or be updated online from action-level feedback. As a result, they provide limited support for trackable, fine-grained, and uncertainty-aware adaptation during deployment. We propose OLIVIA, an inference-time action adaptation framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.