When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning
Arahan Kujur

TL;DR
This paper investigates adversarial action masking in self-play reinforcement learning, revealing its significant impact on various algorithms and identifying action availability as a key robustness factor.
Contribution
It introduces the concept of adversarial action removal, demonstrating its effectiveness across multiple RL algorithms and domains, and proposes mechanisms to understand its impact.
Findings
Learned masking causes more damage than random masking.
The attack persists across various RL algorithms and transfers between agents.
Action availability is a critical robustness surface in self-play RL.
Abstract
We study adversarial action masking in self-play reinforcement learning: an attacker selectively removes legal actions from a victim's action set. Unlike observation or action perturbations, removal eliminates decision options before the agent acts. Across poker games scaling from 6 to 5,531 information states and two non-poker domains, learned masking causes substantially more damage than random masking and learned perturbation baselines. The attack persists across Q-learning, PPO, NFSP, neural NFSP, and DQN victims; transfers across agents; is amplified by self-play; and shows no recovery under extended masked training. Mechanistically, the adversary targets high-value decision points, captured by reach-weighted contingent action capacity (CAC) and a value-weighted refinement CAC. These results identify action availability as a distinct robustness surface in self-play RL.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
