Robust off-policy Reinforcement Learning via Soft Constrained Adversary

Kosuke Nakanishi; Akihiro Kubo; Yuji Yasui; Shin Ishii

arXiv:2409.00418·cs.LG·September 4, 2024

Robust off-policy Reinforcement Learning via Soft Constrained Adversary

Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui, Shin Ishii

PDF

Open Access

TL;DR

This paper introduces a novel robust off-policy reinforcement learning approach using soft constrained adversaries based on f-divergence, addressing long-term horizon limitations and incorporating prior knowledge of perturbation distributions.

Contribution

It proposes a new adversarial RL framework with f-divergence constraints, enabling effective long-term robustness and leveraging prior distribution knowledge.

Findings

01

Achieves high sample efficiency in off-policy RL.

02

Demonstrates robustness against adversarial perturbations.

03

Outperforms existing methods in experimental evaluations.

Abstract

Recently, robust reinforcement learning (RL) methods against input observation have garnered significant attention and undergone rapid evolution due to RL's potential vulnerability. Although these advanced methods have achieved reasonable success, there have been two limitations when considering adversary in terms of long-term horizons. First, the mutual dependency between the policy and its corresponding optimal adversary limits the development of off-policy RL algorithms; although obtaining optimal adversary should depend on the current policy, this has restricted applications to off-policy RL. Second, these methods generally assume perturbations based only on the $L_{p}$ -norm, even when prior knowledge of the perturbation distribution in the environment is available. We here introduce another perspective on adversarial RL: an f-divergence constrained problem with the prior knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExtremum Seeking Control Systems · Advanced Control Systems Optimization · Adaptive Dynamic Programming Control

MethodsSoftmax · Attention Is All You Need