Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic   Markov Decision Processes

Yuval Emek; Ron Lavi; Rad Niazadeh; Yangguang Shi

arXiv:2005.01869·cs.GT·June 30, 2020·5 cites

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi

PDF

Open Access 1 Video

TL;DR

This paper introduces a new online learning framework for dynamic resource allocation problems with capacity constraints, using deterministic Markov decision processes to achieve vanishing regret in stateful posted pricing scenarios.

Contribution

The paper develops a novel online learning approach based on deterministic Markov decision processes, enabling vanishing regret for complex stateful pricing problems with dynamic transitions.

Findings

01

Proves vanishing regret is achievable under certain conditions

02

Reduces the problem to online learning with switching costs

03

Demonstrates applicability to online job scheduling and matching

Abstract

In this paper, a rather general online problem called dynamic resource allocation with capacity constraints (DRACC) is introduced and studied in the realm of posted price mechanisms. This problem subsumes several applications of stateful pricing, including but not limited to posted prices for online job scheduling and matching over a dynamic bipartite graph. As the existing online learning techniques do not yield vanishing-regret mechanisms for this problem, we develop a novel online learning framework defined over deterministic Markov decision processes with dynamic state transition and reward functions. We then prove that if the Markov decision process is guaranteed to admit an oracle that can simulate any given policy from any initial state with bounded loss -- a condition that is satisfied in the DRACC problem -- then the online learning problem can be solved with vanishing regret.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems