SAINT: Attention-Based Policies for Discrete Combinatorial Action Spaces

Matthew Landers; Taylor W. Killian; Thomas Hartvigsen; Afsaneh Doryab

arXiv:2505.12109·cs.LG·February 2, 2026

SAINT: Attention-Based Policies for Discrete Combinatorial Action Spaces

Matthew Landers, Taylor W. Killian, Thomas Hartvigsen, Afsaneh Doryab

PDF

Open Access 1 Repo 3 Reviews

TL;DR

SAINT is a novel transformer-based policy architecture that effectively models complex joint dependencies in large combinatorial action spaces, improving reinforcement learning performance across diverse environments.

Contribution

Introduces SAINT, a permutation-invariant transformer architecture for combinatorial actions, capturing dependencies and enhancing sample efficiency in RL.

Findings

01

Outperforms baselines in 18 environments

02

Handles up to 1.35 quintillion actions

03

Models complex joint action dependencies

Abstract

The combinatorial structure of many real-world action spaces leads to exponential growth in the number of possible actions, limiting the effectiveness of conventional reinforcement learning algorithms. Recent approaches for combinatorial action spaces impose factorized or sequential structures over sub-actions, failing to capture complex joint behavior. We introduce the Sub-Action Interaction Network using Transformers (SAINT), a novel policy architecture that represents multi-component actions as unordered sets and models their dependencies via self-attention conditioned on the global state. SAINT is permutation-invariant, sample-efficient, and compatible with standard policy optimization algorithms. In 18 distinct combinatorial environments across three task domains, including environments with $1.35 \times 1 0^{18}$ possible actions, SAINT consistently outperforms strong baselines.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The authors provide ablations showing the robustness of the proposed method on varying dimensionality and varying sub-action dependence.

Weaknesses

The proposed method can have high computational costs. when action space is large, the learnable embedding vector e_i has high dimension. Adding state conditioning further increase the dimensionality.

Reviewer 02Rating 6Confidence 3

Strengths

- The proposed approach can model complex, context-sensitive dependencies in large action spaces. It is permutation invariant, i.e., naturally fits unordered action compositions. - The evaluation conducted is extensive and compelling. I appreciate the ablations. Experiments show that the proposed approach consistently outperforms baselines on diverse tasks: state-independent (traffic control), state-dependent (navigation), and weakly dependent (discretized MuJoCo). The scalability of the appro

Weaknesses

- The approach may be less justified for low-dimensional or weakly structured domains. Suggestions: - Since combinatorial action spaces are common in offline RL (e.g., healthcare), systematic analysis in off-policy contexts could further establish SAINT's utility.

Reviewer 03Rating 2Confidence 4

Strengths

1. **Clarity:** The paper is written with outstanding clarity, making the problem, prior work, and the proposed method very easy to understand. 2. **Problem Formulation:** The authors correctly identify a key limitation of existing approaches, namely the rigid and often incorrect inductive bias of a fixed autoregressive ordering. 3. **Architectural Fit:** The idea of using a permutation-equivariant architecture is an elegant and principled solution for the *specific class of problems* where s

Weaknesses

1. **Incremental Novelty:** The technical contribution is thin. The method consists of a known neural architecture (self-attention on an unordered set) plugged into a standard, on-policy algorithm (PPO). This is an exercise in architectural engineering, not a new method, and its novelty is limited. 2. **Fundamentally Questionable Inductive Bias:** The paper's entire motivation rests on the assumption that permutation-equivariance is a *universally desirable* property. This is a strong and, in

Code & Models

Repositories

matthewlanders/saint
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsDense Connections · Feedforward Network · CutMix · Mixup · SAINT