Hierarchical Upper Confidence Bounds for Constrained Online Learning

Ali Baheri

arXiv:2410.17216·cs.LG·October 28, 2024

Hierarchical Upper Confidence Bounds for Constrained Online Learning

Ali Baheri

PDF

Open Access

TL;DR

This paper introduces a hierarchical bandit framework with constraints, proposing an algorithm that achieves near-optimal regret bounds and high-probability constraint satisfaction in complex decision hierarchies.

Contribution

It extends contextual bandits to hierarchical, constrained settings and provides a novel algorithm with theoretical guarantees and minimax regret bounds.

Findings

01

Proposed HC-UCB algorithm for hierarchical constrained bandits.

02

Established sublinear regret bounds for HC-UCB.

03

Proved high-probability constraint satisfaction and near-optimal regret.

Abstract

The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation. Traditional MAB formulations, however, do not adequately capture scenarios where decisions are structured hierarchically, involve multi-level constraints, or feature context-dependent action spaces. In this paper, we introduce the hierarchical constrained bandits (HCB) framework, which extends the contextual bandit problem to incorporate hierarchical decision structures and multi-level constraints. We propose the hierarchical constrained upper confidence bound (HC-UCB) algorithm, designed to address the complexities of the HCB problem by leveraging confidence bounds within a hierarchical setting. Our theoretical analysis establishes sublinear regret bounds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Intelligent Tutoring Systems and Adaptive Learning