Joint Online Learning and Decision-making via Dual Mirror Descent
Alfonso Lobos, Paul Grigas, Zheng Wen

TL;DR
This paper introduces a joint online learning and decision-making framework using dual mirror descent, achieving sublinear regret and constraint violation bounds in revenue maximization with unknown parameters.
Contribution
It develops a novel algorithm combining dual mirror descent with parameter learning, providing theoretical guarantees for regret and constraint violations in unknown-parameter settings.
Findings
Achieves $O( oot T)$ regret when parameters are known.
Provides bounds on constraint violations proportional to $O( oot T)$.
Extends results to unknown parameters with additional learning-dependent terms.
Abstract
We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an regret result as well an bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous terms plus terms that directly depend on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
