Stochastic Q-learning for Large Discrete Action Spaces
Fares Fourati, Vaneet Aggarwal, Mohamed-Slim Alouini

TL;DR
This paper introduces stochastic value-based reinforcement learning methods that efficiently handle large discrete action spaces by considering only a small, stochastic subset of actions per iteration, reducing computational costs while maintaining high performance.
Contribution
The paper proposes novel stochastic Q-learning algorithms that consider a subset of actions, with proven convergence and superior empirical performance over traditional methods.
Findings
Outperforms baseline methods in diverse control environments
Achieves near-optimal returns with reduced computation time
Converges theoretically under certain conditions
Abstract
In complex environments with large discrete action spaces, effective decision-making is critical in reinforcement learning (RL). Despite the widespread use of value-based RL approaches like Q-learning, they come with a computational burden, necessitating the maximization of a value function over all actions in each iteration. This burden becomes particularly challenging when addressing large-scale problems and using deep neural networks as function approximators. In this paper, we present stochastic value-based RL approaches which, in each iteration, as opposed to optimizing over the entire set of actions, only consider a variable stochastic set of a sublinear number of actions, possibly as small as . The presented stochastic value-based RL methods include, among others, Stochastic Q-learning, StochDQN, and StochDDQN, all of which integrate this stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Anomaly Detection Techniques and Applications · Face and Expression Recognition
MethodsSparse Evolutionary Training · Q-Learning
