Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos; Paul Grigas; Zheng Wen

arXiv:2104.09750·cs.LG·April 21, 2021·1 cites

Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos, Paul Grigas, Zheng Wen

PDF

Open Access 1 Video

TL;DR

This paper introduces a joint online learning and decision-making framework using dual mirror descent, achieving sublinear regret and constraint violation bounds in revenue maximization with unknown parameters.

Contribution

It develops a novel algorithm combining dual mirror descent with parameter learning, providing theoretical guarantees for regret and constraint violations in unknown-parameter settings.

Findings

01

Achieves $O( oot T)$ regret when parameters are known.

02

Provides bounds on constraint violations proportional to $O( oot T)$.

03

Extends results to unknown parameters with additional learning-dependent terms.

Abstract

We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O (T)$ regret result as well an $O (T)$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O (T)$ terms plus terms that directly depend on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Joint Online Learning and Decision-making via Dual Mirror Descent· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms