Learning in Online MDPs: Is there a Price for Handling the Communicating Case?
Gautam Chandrasekaran, Ambuj Tewari

TL;DR
This paper demonstrates that online Markov Decision Processes with communicating structure can be learned at an $O( oot T)$ regret rate with full information, challenging previous beliefs about the cost of such structure under bandit feedback.
Contribution
It introduces an efficient follow the perturbed leader algorithm for deterministic transitions and extends results to stochastic transitions with regret bounds, under certain conditions.
Findings
Full information allows $O( oot T)$ regret in communicating MDPs.
Proposed efficient FPL algorithm for deterministic transitions.
Achieves $O( oot{rac{T}{eta}})$ regret with restricted initial state distribution.
Abstract
It is a remarkable fact that the same regret rate can be achieved in both the Experts Problem and the Adversarial Multi-Armed Bandit problem albeit with a worse dependence on number of actions in the latter case. In contrast, it has been shown that handling online MDPs with communicating structure and bandit information incurs regret even in the case of deterministic transitions. Is this the price we pay for handling communicating structure or is it because we also have bandit feedback? In this paper we show that with full information, online MDPs can still be learned at an rate even in the presence of communicating structure. We first show this by proposing an efficient follow the perturbed leader (FPL) algorithm for the deterministic transition case. We then extend our scope to consider stochastic transitions where we first give an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques
