BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces
Matthew Landers, Taylor W. Killian, Hugo Barnes, Thomas Hartvigsen, Afsaneh Doryab

TL;DR
BraVE introduces a tree-structured approach for offline reinforcement learning in high-dimensional discrete action spaces, efficiently evaluating joint actions and capturing dependencies, leading to significant performance improvements.
Contribution
The paper proposes BraVE, a novel value-based method using tree traversal to efficiently evaluate joint actions while modeling sub-action dependencies in offline RL.
Findings
Outperforms prior methods by up to 20x in large action spaces
Handles over four million actions efficiently
Preserves sub-action dependency structure
Abstract
Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to in environments with over four million actions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
