Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits

Maksim Pershin; Ivan Golovanov; Pavel Baltabaev; Natalia Trankova

arXiv:2604.14961·cs.LG·April 17, 2026

Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits

Maksim Pershin, Ivan Golovanov, Pavel Baltabaev, Natalia Trankova

PDF

TL;DR

This paper introduces a method to improve online contextual bandit algorithms by using large language models to generate pseudo-observations, with a calibration-based weighting scheme to enhance early decision-making.

Contribution

It proposes a novel augmentation of Disjoint LinUCB with LLM pseudo-observations and a calibration-gated decay schedule to mitigate cold-start regret.

Findings

01

LLM pseudo-observations reduce regret by 19% on MIND with task-specific prompts.

02

Prompt design significantly impacts performance, more than decay schedule or gating parameters.

03

Calibration gating's effectiveness varies with prediction error levels, affecting bias-variance trade-offs.

Abstract

Contextual bandit algorithms suffer from high regret during cold-start, when the learner has insufficient data to distinguish good arms from bad. We propose augmenting Disjoint LinUCB with LLM pseudo-observations: after each round, a large language model predicts counterfactual rewards for the unplayed arms, and these predictions are injected into the learner as weighted pseudo-observations. The injection weight is controlled by a calibration-gated decay schedule that tracks the LLM's prediction accuracy on played arms via an exponential moving average; high calibration error suppresses the LLM's influence, while accurate predictions receive higher weight during the critical early rounds. We evaluate on two contextual bandit environments - UCI Mushroom (2-arm, asymmetric rewards) and MIND-small (5-arm news recommendation) - and find that when equipped with a task-specific prompt, LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.