Multi-objective Contextual Multi-armed Bandit with a Dominant Objective

Cem Tekin; Eralp Turgay

arXiv:1708.05655·cs.LG·June 4, 2018

Multi-objective Contextual Multi-armed Bandit with a Dominant Objective

Cem Tekin, Eralp Turgay

PDF

TL;DR

This paper introduces a new multi-objective contextual multi-armed bandit problem with a dominant objective, proposing an algorithm that achieves sublinear regret and is applicable to various real-world scenarios.

Contribution

The paper formulates the CMAB-DO problem, analyzes the optimal arm structure, and proposes MOC-MAB with proven sublinear regret bounds, advancing multi-objective bandit research.

Findings

01

MOC-MAB achieves sublinear 2D and Pareto regret.

02

The optimal arm lies on the Pareto front.

03

The algorithm performs well on synthetic and real datasets.

Abstract

In this paper, we propose a new multi-objective contextual multi-armed bandit (MAB) problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective MAB problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives and the distribution of the reward depends on the context that is provided to the learner at the beginning of each round. We call this problem contextual multi-armed bandit with a dominant objective (CMAB-DO). In CMAB-DO, the goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its total reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.