Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions
Youngmin Oh

TL;DR
This paper introduces a robust algorithm for linear dueling bandits in volatile environments with delays and adversarial corruptions, achieving near-optimal regret bounds under challenging conditions.
Contribution
It proposes erm, an algorithm that predicts post-serving contexts and adaptively mitigates delays and corruptions, with theoretical guarantees on regret.
Findings
Achieves a regret bound of ( ( ext{T}) + ext{C} + ext{D}))
The algorithm is delay-regime-agnostic and handles unknown stochastic or adversarial delays and corruptions.
Lower bounds nearly match upper bounds, confirming near-optimality in adversarial delay settings.
Abstract
We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget . To address these challenges, we propose \term, which integrates a learned approximator that predicts post-serving contexts from pre-serving information. It further employs an adaptive weighting strategy that clips feature vectors to mitigate the impact of corrupted and delayed observations simultaneously. Under standard regularity conditions and a parametric post-serving mapping, we rigorously establish that our algorithm is delay-regime-agnostic, achieving a regret upper bound of , where is the total feature dimension and …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
