Online Resource Allocation in Episodic Markov Decision Processes

Duksang Lee; William Overman; Dabeen Lee

arXiv:2305.10744·cs.DS·October 20, 2023·1 cites

Online Resource Allocation in Episodic Markov Decision Processes

Duksang Lee, William Overman, Dabeen Lee

PDF

Open Access

TL;DR

This paper introduces an online resource allocation framework for episodic Markov decision processes with unknown, non-stationary dynamics, proposing algorithms that achieve near-optimal regret bounds in different feedback regimes.

Contribution

It develops an online dual mirror descent algorithm for resource allocation in episodic MDPs, improving regret bounds under two feedback settings.

Findings

01

Achieves near-optimal regret bounds in both regimes.

02

Demonstrates numerical efficiency on inventory management problems.

03

Provides theoretical analysis for non-stationary, constrained MDPs.

Abstract

This paper studies a long-term resource allocation problem over multiple periods where each period requires a multi-stage decision-making process. We formulate the problem as an online allocation problem in an episodic finite-horizon constrained Markov decision process with an unknown non-stationary transition function and stochastic non-stationary reward and resource consumption functions. We propose the observe-then-decide regime and improve the existing decide-then-observe regime, while the two settings differ in how the observations and feedback about the reward and resource consumption functions are given to the decision-maker. We develop an online dual mirror descent algorithm that achieves near-optimal regret bounds for both settings. For the observe-then-decide regime, we prove that the expected regret against the dynamic clairvoyant optimal policy is bounded by $\tilde…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Age of Information Optimization · Advanced Queuing Theory Analysis