Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

TL;DR
This paper introduces a new computationally efficient algorithm for reinforcement learning in adversarially corrupted environments, achieving near-optimal regret bounds with general function approximation in both contextual bandits and Markov decision processes.
Contribution
It develops a novel uncertainty-weighted least-squares approach for general function classes, extending robust RL algorithms beyond linear settings with improved regret bounds.
Findings
Achieves regret of ten + zeta for corrupted RL.
Generalizes to episodic MDPs with additive corruption dependence.
Outperforms existing methods across all corruption levels.
Abstract
Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired regret bound, where is the number of rounds and is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of . The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit and a new weighted estimator of uncertainty for the general function class. In contrast to the existing analysis that heavily relies on the linear structure, we develop a novel technique to control the sum of weighted uncertainty, thus establishing the final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
