Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs
Max Simchowitz, Kevin Jamieson

TL;DR
This paper proves that certain optimistic algorithms for episodic MDPs achieve non-asymptotic, gap-dependent logarithmic regret bounds without relying on diameter or ergodicity assumptions, bridging gap-dependent and minimax rates.
Contribution
It introduces a novel 'clipped' regret decomposition technique that provides gap-dependent regret bounds for a broad class of optimistic algorithms in episodic MDPs, independent of diameter-like quantities.
Findings
Achieves logarithmic regret bounds that depend on the gap and are non-asymptotic.
Bounds do not depend on diameter-like quantities or ergodicity assumptions.
Interpolates smoothly between gap-dependent logarithmic regret and minimax $ ilde{O}( oot{3} ext{HSAT})$ rate.
Abstract
This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs. In contrast to prior work, our bounds do not suffer a dependence on diameter-like quantities or ergodicity, and smoothly interpolate between the gap dependent logarithmic-regret, and the -minimax rate. The key technique in our analysis is a novel "clipped" regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
