Multi-gear bandits, partial conservation laws, and indexability
Jos\'e Ni\~no-Mora

TL;DR
This paper introduces multi-gear bandits, a class of Markov decision processes with multiple operational modes, and provides conditions and algorithms for their indexability and optimal policy computation, advancing resource management strategies.
Contribution
It establishes new PCL-indexability conditions for multi-gear bandits and develops an efficient algorithm to compute the dynamic allocation index, enabling optimal policies.
Findings
Model is indexable under proposed conditions
Efficient algorithm computes the dynamic allocation index
Index policy outperforms baseline strategies
Abstract
This paper considers what we propose to call multi-gear bandits, which are Markov decision processes modeling a generic dynamic and stochastic project fueled by a single resource and which admit multiple actions representing gears of operation naturally ordered by their increasing resource consumption. The optimal operation of a multi-gear bandit aims to strike a balance between project performance costs or rewards and resource usage costs, which depend on the resource price. A computationally convenient and intuitive optimal solution is available when such a model is indexable, meaning that its optimal policies are characterized by a dynamic allocation index (DAI), a function of state--action pairs representing critical resource prices. Motivated by the lack of general indexability conditions and efficient index-computing schemes, and focusing on the infinite-horizon finite-state and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications
