Representative Action Selection for Large Action Space: From Bandits to MDPs

Quan Zhou; Shie Mannor

arXiv:2511.22104·cs.LG·December 1, 2025

Representative Action Selection for Large Action Space: From Bandits to MDPs

Quan Zhou, Shie Mannor

PDF

Open Access

TL;DR

This paper introduces a method for selecting a small, representative subset of actions in large action spaces across multiple RL environments, enabling efficient learning while maintaining near-optimal performance.

Contribution

It extends previous meta-bandit algorithms to the MDP setting, providing theoretical guarantees under a relaxed sub-Gaussian model for large-scale decision-making.

Findings

01

Achieves performance comparable to full action space methods.

02

Provides theoretical guarantees under a more general environmental model.

03

Offers a computationally efficient approach for large action spaces.

Abstract

We study the problem of selecting a small, representative action subset from an extremely large action space shared across a family of reinforcement learning (RL) environments -- a fundamental challenge in applications like inventory management and recommendation systems, where direct learning over the entire space is intractable. Our goal is to identify a fixed subset of actions that, for every environment in the family, contains a near-optimal action, thereby enabling efficient learning without exhaustively evaluating all actions. This work extends our prior results for meta-bandits to the more general setting of Markov Decision Processes (MDPs). We prove that our existing algorithm achieves performance comparable to using the full action space. This theoretical guarantee is established under a relaxed, non-centered sub-Gaussian process model, which accommodates greater…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics