Adversarially-Robust TD Learning with Markovian Data: Finite-Time Rates and Fundamental Limits
Sreejeet Maity, Aritra Mitra

TL;DR
This paper addresses the robustness of TD learning in reinforcement learning under adversarial reward corruption, proposing a new algorithm with finite-time guarantees and establishing fundamental limits of robustness.
Contribution
It introduces Robust-TD, a novel algorithm that maintains near-optimal convergence guarantees despite adversarial reward contamination, and provides a minimax lower bound for such robustness.
Findings
Vanilla TD can be manipulated to any value by an adversary.
Robust-TD achieves finite-time guarantees close to vanilla TD, with an additive $O(psilon)$ term.
A minimax lower bound shows the unavoidable impact of adversarial corruption.
Abstract
One of the most basic problems in reinforcement learning (RL) is policy evaluation: estimating the long-term return, i.e., value function, corresponding to a given fixed policy. The celebrated Temporal Difference (TD) learning algorithm addresses this problem, and recent work has investigated finite-time convergence guarantees for this algorithm and variants thereof. However, these guarantees hinge on the reward observations being always generated from a well-behaved (e.g., sub-Gaussian) true reward distribution. Motivated by harsh, real-world environments where such an idealistic assumption may no longer hold, we revisit the policy evaluation problem from the perspective of adversarial robustness. In particular, we consider a Huber-contaminated reward model where an adversary can arbitrarily corrupt each reward sample with a small probability . Under this observation model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems · Anomaly Detection Techniques and Applications
