Hierarchical Upper Confidence Bounds for Constrained Online Learning
Ali Baheri

TL;DR
This paper introduces a hierarchical bandit framework with constraints, proposing an algorithm that achieves near-optimal regret bounds and high-probability constraint satisfaction in complex decision hierarchies.
Contribution
It extends contextual bandits to hierarchical, constrained settings and provides a novel algorithm with theoretical guarantees and minimax regret bounds.
Findings
Proposed HC-UCB algorithm for hierarchical constrained bandits.
Established sublinear regret bounds for HC-UCB.
Proved high-probability constraint satisfaction and near-optimal regret.
Abstract
The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation. Traditional MAB formulations, however, do not adequately capture scenarios where decisions are structured hierarchically, involve multi-level constraints, or feature context-dependent action spaces. In this paper, we introduce the hierarchical constrained bandits (HCB) framework, which extends the contextual bandit problem to incorporate hierarchical decision structures and multi-level constraints. We propose the hierarchical constrained upper confidence bound (HC-UCB) algorithm, designed to address the complexities of the HCB problem by leveraging confidence bounds within a hierarchical setting. Our theoretical analysis establishes sublinear regret bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Intelligent Tutoring Systems and Adaptive Learning
