Representative Action Selection for Large Action Space: From Bandits to MDPs
Quan Zhou, Shie Mannor

TL;DR
This paper introduces a method for selecting a small, representative subset of actions in large action spaces across multiple RL environments, enabling efficient learning while maintaining near-optimal performance.
Contribution
It extends previous meta-bandit algorithms to the MDP setting, providing theoretical guarantees under a relaxed sub-Gaussian model for large-scale decision-making.
Findings
Achieves performance comparable to full action space methods.
Provides theoretical guarantees under a more general environmental model.
Offers a computationally efficient approach for large action spaces.
Abstract
We study the problem of selecting a small, representative action subset from an extremely large action space shared across a family of reinforcement learning (RL) environments -- a fundamental challenge in applications like inventory management and recommendation systems, where direct learning over the entire space is intractable. Our goal is to identify a fixed subset of actions that, for every environment in the family, contains a near-optimal action, thereby enabling efficient learning without exhaustively evaluating all actions. This work extends our prior results for meta-bandits to the more general setting of Markov Decision Processes (MDPs). We prove that our existing algorithm achieves performance comparable to using the full action space. This theoretical guarantee is established under a relaxed, non-centered sub-Gaussian process model, which accommodates greater…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics
