Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete Action Spaces

Heiko Hoppe; Fabian Akkerman; Wouter van Heeswijk; Maximilian Schiffer

arXiv:2602.08616·cs.LG·May 12, 2026

Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete Action Spaces

Heiko Hoppe, Fabian Akkerman, Wouter van Heeswijk, Maximilian Schiffer

PDF

TL;DR

This paper introduces Distance-Guided Reinforcement Learning (DGRL), a novel method that efficiently handles extremely large discrete action spaces by combining neighborhood sampling and distance-based updates, improving performance and convergence.

Contribution

DGRL is the first approach to enable scalable RL in massive discrete action spaces using stochastic volumetric exploration and stable policy regression.

Findings

01

DGRL achieves up to 66% performance improvement over state-of-the-art methods.

02

DGRL guarantees local value improvement on structured tasks.

03

DGRL improves convergence speed and reduces computational complexity.

Abstract

Reinforcement Learning (RL) is increasingly applied to large-scale decision-making problems like logistics, scheduling, and recommender systems, but existing algorithms struggle with the curse of dimensionality in such large discrete action spaces. We propose Distance-Guided Reinforcement Learning (DGRL), combining Sampled Dynamic Neighborhoods and Distance-Based Updates to enable efficient RL in problems with up to $1 0^{20}$ actions. Unlike prior methods, DGRL performs stochastic volumetric exploration and transforms policy optimization into a stable regression task, decoupling gradient variance from action space cardinality. On structured tasks, DGRL provably guarantees local value improvement. DGRL naturally generalizes to hybrid continuous-discrete action spaces. We demonstrate performance improvements of up to 66% against state-of-the-art benchmarks across regularly and irregularly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.