Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear   Contextual Bandits and Markov Decision Processes

Chenlu Ye; Wei Xiong; Quanquan Gu; Tong Zhang

arXiv:2212.05949·stat.ML·February 13, 2024

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a new computationally efficient algorithm for reinforcement learning in adversarially corrupted environments, achieving near-optimal regret bounds with general function approximation in both contextual bandits and Markov decision processes.

Contribution

It develops a novel uncertainty-weighted least-squares approach for general function classes, extending robust RL algorithms beyond linear settings with improved regret bounds.

Findings

01

Achieves regret of ten + zeta for corrupted RL.

02

Generalizes to episodic MDPs with additive corruption dependence.

03

Outperforms existing methods across all corruption levels.

Abstract

Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O} (T ζ)$ regret bound, where $T$ is the number of rounds and $ζ$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O} (T + ζ)$ . The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit and a new weighted estimator of uncertainty for the general function class. In contrast to the existing analysis that heavily relies on the linear structure, we develop a novel technique to control the sum of weighted uncertainty, thus establishing the final…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms