Robust off-policy Reinforcement Learning via Soft Constrained Adversary
Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui, Shin Ishii

TL;DR
This paper introduces a novel robust off-policy reinforcement learning approach using soft constrained adversaries based on f-divergence, addressing long-term horizon limitations and incorporating prior knowledge of perturbation distributions.
Contribution
It proposes a new adversarial RL framework with f-divergence constraints, enabling effective long-term robustness and leveraging prior distribution knowledge.
Findings
Achieves high sample efficiency in off-policy RL.
Demonstrates robustness against adversarial perturbations.
Outperforms existing methods in experimental evaluations.
Abstract
Recently, robust reinforcement learning (RL) methods against input observation have garnered significant attention and undergone rapid evolution due to RL's potential vulnerability. Although these advanced methods have achieved reasonable success, there have been two limitations when considering adversary in terms of long-term horizons. First, the mutual dependency between the policy and its corresponding optimal adversary limits the development of off-policy RL algorithms; although obtaining optimal adversary should depend on the current policy, this has restricted applications to off-policy RL. Second, these methods generally assume perturbations based only on the -norm, even when prior knowledge of the perturbation distribution in the environment is available. We here introduce another perspective on adversarial RL: an f-divergence constrained problem with the prior knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExtremum Seeking Control Systems · Advanced Control Systems Optimization · Adaptive Dynamic Programming Control
MethodsSoftmax · Attention Is All You Need
