Learning When to Trust in Contextual Bandits

Majid Ghasemi; Mark Crowley

arXiv:2603.13356·cs.AI·March 17, 2026

Learning When to Trust in Contextual Bandits

Majid Ghasemi, Mark Crowley

PDF

Open Access

TL;DR

This paper introduces CESA-LinUCB, a method that learns context-dependent trust boundaries in robust reinforcement learning, effectively handling evaluators that are truthful in some contexts but biased in others.

Contribution

It identifies the failure of standard robust methods under Contextual Sycophancy and proposes CESA-LinUCB to learn context-aware trust boundaries, achieving sublinear regret.

Findings

01

CESA-LinUCB achieves $ ilde{O}( ext{sqrt}(T))$ regret.

02

Standard methods fail under Contextual Sycophancy.

03

CESA-LinUCB recovers ground truth with no globally reliable evaluators.

Abstract

Standard approaches to Robust Reinforcement Learning assume that feedback sources are either globally trustworthy or globally adversarial. In this paper, we challenge this assumption and we identify a more subtle failure mode. We term this mode as Contextual Sycophancy, where evaluators are truthful in benign contexts but strategically biased in critical ones. We prove that standard robust methods fail in this setting, suffering from Contextual Objective Decoupling. To address this, we propose CESA-LinUCB, which learns a high-dimensional Trust Boundary for each evaluator. We prove that CESA-LinUCB achieves sublinear regret $\tilde{O} (T)$ against contextual adversaries, recovering the ground truth even when no evaluator is globally reliable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI)