OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
Sheldon Yu, Junda Wu, Xintong Li, Nikki Lijing Kuang, Sizhe Zhou, Tong Yu, Jiawei Han, Jingbo Shang, Julian McAuley

TL;DR
OLIVIA introduces an inference-time adaptation framework for ReAct-style LLM agents, modeling action selection as a contextual bandit to improve decision-making efficiency and reliability during deployment.
Contribution
OLIVIA is the first to explicitly model the action selection layer as a bandit, enabling online updates and uncertainty estimation for better deployment performance.
Findings
OLIVIA improves task performance across four benchmarks.
It provides explicit uncertainty estimates for actions.
OLIVIA achieves these improvements with minimal computational overhead.
Abstract
Large language model agents interleave reasoning, action selection, and observation to solve sequential decision-making tasks. In deployed settings where agents repeatedly handle related multi-step tasks, small action-selection errors can accumulate into wasted tool calls, latency, and reduced reliability. Despite this need for deployment-time improvement, existing inference-time adaptation methods for LLM agents mainly rely on prompting or retrieval, which influence behavior indirectly through context manipulation. For ReAct-style agents, such approaches do not expose an explicit decision layer that can score candidate actions, represent uncertainty, or be updated online from action-level feedback. As a result, they provide limited support for trackable, fine-grained, and uncertainty-aware adaptation during deployment. We propose OLIVIA, an inference-time action adaptation framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
