Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation

Kosuke Nakanishi; Akihiro Kubo; Yuji Yasui; Shin Ishii

arXiv:2506.16753·cs.LG·June 23, 2025

Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation

Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui, Shin Ishii

PDF

Open Access 1 Repo

TL;DR

This paper introduces an off-policy reinforcement learning method that enhances robustness against adversarial observations by reformulating the problem as a soft-constrained optimization, supported by symmetric policy evaluation theory.

Contribution

It presents a novel off-policy approach that removes the need for extra environment interactions in adversarial RL, leveraging symmetric policy evaluation for robustness.

Findings

01

Eliminates additional environment interactions in adversarial training

02

Uses symmetric policy evaluation to support the approach

03

Demonstrates improved robustness in RL agents

Abstract

Recently, robust reinforcement learning (RL) methods designed to handle adversarial input observations have received significant attention, motivated by RL's inherent vulnerabilities. While existing approaches have demonstrated reasonable success, addressing worst-case scenarios over long time horizons requires both minimizing the agent's cumulative rewards for adversaries and training agents to counteract them through alternating learning. However, this process introduces mutual dependencies between the agent and the adversary, making interactions with the environment inefficient and hindering the development of off-policy methods. In this work, we propose a novel off-policy method that eliminates the need for additional environmental interactions by reformulating adversarial learning as a soft-constrained optimization problem. Our approach is theoretically supported by the symmetric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nakanakakosuke/valt_sac
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Infrastructure Resilience and Vulnerability Analysis · Domain Adaptation and Few-Shot Learning