Improved Algorithms for Misspecified Linear Markov Decision Processes
Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

TL;DR
This paper introduces a novel algorithm for misspecified linear Markov decision processes that balances regret, computational efficiency, and does not require prior misspecification knowledge, improving upon existing methods.
Contribution
The paper presents the first algorithm for MLMDPs satisfying regret bounds, bounded complexity, and no need for misspecification input, extending and refining the Sup-Lin-UCB approach.
Findings
Regret scales with $K imes$ maximum of misspecification and tolerance.
Algorithm's space and time complexities remain bounded as episodes grow.
Improves existing regret bounds up to log factors for specific tolerance choices.
Abstract
For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after episodes scales as , where is the degree of misspecification and is a user-specified error tolerance. (P2) Its space and per-episode time complexities remain bounded as . (P3) It does not require as input. To our knowledge, this is the first algorithm satisfying all three properties. For concrete choices of , we also improve existing regret bounds (up to log factors) while achieving either (P2) or (P3) (existing algorithms satisfy neither). At a high level, our algorithm generalizes (to MLMDPs) and refines the Sup-Lin-UCB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
