Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles
Noah Golowich, Ankur Moitra, Dhruv Rohatgi

TL;DR
This paper introduces a polynomial-time algorithm for learning in sparse linear MDPs without relying on intractable oracles, advancing feature selection and efficient policy learning in high-dimensional settings.
Contribution
It presents the first polynomial-time algorithm for sparse linear MDPs, utilizing the concept of emulators for efficient Bellman backup computations.
Findings
Existence of efficiently computable emulators for transition representations.
Algorithm achieves near-optimal policy with polynomial interactions in sparse settings.
Extension to block MDPs with low-depth decision trees, enabling sample-efficient learning.
Abstract
The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map that maps state-action pairs to -dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen sink" approach and hope that the true features are included in a much larger set of potential features. In this paper we revisit linear MDPs from the perspective of feature selection. In a -sparse linear MDP, there is an unknown subset of size containing all the relevant features, and the goal is to learn a near-optimal policy in only poly interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Receptor Mechanisms and Signaling
