BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

Matthew Landers; Taylor W. Killian; Hugo Barnes; Thomas Hartvigsen; Afsaneh Doryab

arXiv:2410.21151·cs.LG·January 9, 2026

BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

Matthew Landers, Taylor W. Killian, Hugo Barnes, Thomas Hartvigsen, Afsaneh Doryab

PDF

Open Access 1 Video

TL;DR

BraVE introduces a tree-structured approach for offline reinforcement learning in high-dimensional discrete action spaces, efficiently evaluating joint actions and capturing dependencies, leading to significant performance improvements.

Contribution

The paper proposes BraVE, a novel value-based method using tree traversal to efficiently evaluate joint actions while modeling sub-action dependencies in offline RL.

Findings

01

Outperforms prior methods by up to 20x in large action spaces

02

Handles over four million actions efficiently

03

Preserves sub-action dependency structure

Abstract

Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20 \times$ in environments with over four million actions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics