Adversarially-Robust TD Learning with Markovian Data: Finite-Time Rates   and Fundamental Limits

Sreejeet Maity; Aritra Mitra

arXiv:2502.04662·cs.LG·February 10, 2025

Adversarially-Robust TD Learning with Markovian Data: Finite-Time Rates and Fundamental Limits

Sreejeet Maity, Aritra Mitra

PDF

Open Access

TL;DR

This paper addresses the robustness of TD learning in reinforcement learning under adversarial reward corruption, proposing a new algorithm with finite-time guarantees and establishing fundamental limits of robustness.

Contribution

It introduces Robust-TD, a novel algorithm that maintains near-optimal convergence guarantees despite adversarial reward contamination, and provides a minimax lower bound for such robustness.

Findings

01

Vanilla TD can be manipulated to any value by an adversary.

02

Robust-TD achieves finite-time guarantees close to vanilla TD, with an additive $O(psilon)$ term.

03

A minimax lower bound shows the unavoidable impact of adversarial corruption.

Abstract

One of the most basic problems in reinforcement learning (RL) is policy evaluation: estimating the long-term return, i.e., value function, corresponding to a given fixed policy. The celebrated Temporal Difference (TD) learning algorithm addresses this problem, and recent work has investigated finite-time convergence guarantees for this algorithm and variants thereof. However, these guarantees hinge on the reward observations being always generated from a well-behaved (e.g., sub-Gaussian) true reward distribution. Motivated by harsh, real-world environments where such an idealistic assumption may no longer hold, we revisit the policy evaluation problem from the perspective of adversarial robustness. In particular, we consider a Huber-contaminated reward model where an adversary can arbitrarily corrupt each reward sample with a small probability $ϵ$ . Under this observation model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Fault Detection and Control Systems · Anomaly Detection Techniques and Applications