Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms

Renos Zabounidis; Roy Siegelmann; Mohamad Qadri; Woojun Kim; Simon Stepputtis; Katia P. Sycara

arXiv:2603.09090·cs.LG·March 11, 2026

Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms

Renos Zabounidis, Roy Siegelmann, Mohamad Qadri, Woojun Kim, Simon Stepputtis, Katia P. Sycara

PDF

Open Access

TL;DR

This paper identifies a failure mode in unmasked policy gradient algorithms where valid actions are suppressed due to shared parameters, and proposes a solution using feasibility classification to improve performance in environments with action validity constraints.

Contribution

The paper reveals a systematic suppression issue in unmasked policies caused by parameter sharing and proposes feasibility classification as a practical remedy.

Findings

01

Unmasked training suppresses valid actions at unvisited states.

02

Feasibility classification enables deployment without oracle masks.

03

Empirical results confirm exponential suppression and effectiveness of the proposed method.

Abstract

In reinforcement learning environments with state-dependent action validity, action masking consistently outperforms penalty-based handling of invalid actions, yet existing theory only shows that masking preserves the policy gradient theorem. We identify a distinct failure mode of unmasked training: it systematically suppresses valid actions at states the agent has not yet visited. This occurs because gradients pushing down invalid actions at visited states propagate through shared network parameters to unvisited states where those actions are valid. We prove that for softmax policies with shared features, when an action is invalid at visited states but valid at an unvisited state $s^{*}$ , the probability $π (a ∣ s^{*})$ is bounded by exponential decay due to parameter sharing and the zero-sum identity of softmax logits. This bound reveals that entropy regularization trades off between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning