Improved Algorithms for Misspecified Linear Markov Decision Processes

Daniel Vial; Advait Parulekar; Sanjay Shakkottai; R. Srikant

arXiv:2109.05546·cs.LG·March 2, 2022·1 cites

Improved Algorithms for Misspecified Linear Markov Decision Processes

Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

PDF

Open Access

TL;DR

This paper introduces a novel algorithm for misspecified linear Markov decision processes that balances regret, computational efficiency, and does not require prior misspecification knowledge, improving upon existing methods.

Contribution

The paper presents the first algorithm for MLMDPs satisfying regret bounds, bounded complexity, and no need for misspecification input, extending and refining the Sup-Lin-UCB approach.

Findings

01

Regret scales with $K imes$ maximum of misspecification and tolerance.

02

Algorithm's space and time complexities remain bounded as episodes grow.

03

Improves existing regret bounds up to log factors for specific tolerance choices.

Abstract

For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after $K$ episodes scales as $K max {ε_{mis}, ε_{tol}}$ , where $ε_{mis}$ is the degree of misspecification and $ε_{tol}$ is a user-specified error tolerance. (P2) Its space and per-episode time complexities remain bounded as $K \to \infty$ . (P3) It does not require $ε_{mis}$ as input. To our knowledge, this is the first algorithm satisfying all three properties. For concrete choices of $ε_{tol}$ , we also improve existing regret bounds (up to log factors) while achieving either (P2) or (P3) (existing algorithms satisfy neither). At a high level, our algorithm generalizes (to MLMDPs) and refines the Sup-Lin-UCB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization