Certifying Safety in Reinforcement Learning under Adversarial   Perturbation Attacks

Junlin Wu; Hussein Sibai; Yevgeniy Vorobeychik

arXiv:2212.14115·cs.LG·January 2, 2023

Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks

Junlin Wu, Hussein Sibai, Yevgeniy Vorobeychik

PDF

Open Access

TL;DR

This paper introduces a novel framework for certifying safety in reinforcement learning under adversarial attacks, focusing on safety properties in POMDPs and leveraging true state information during training.

Contribution

It presents the first method for certifying safety of PSRL policies against adversarial perturbations and introduces two adversarial training approaches utilizing true state knowledge.

Findings

01

Effective safety certification in adversarial environments.

02

Improved safety guarantees with high nominal reward.

03

Enhanced true state prediction accuracy.

Abstract

Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning