Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning
Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau

TL;DR
This paper introduces a unified approach for learning and planning in partially observable environments, leveraging spectral learning and unnormalized Q functions to improve efficiency and theoretical guarantees.
Contribution
It proposes a novel algorithm that integrates learning and planning, inspired by spectral methods, with proven theoretical guarantees and practical efficiency.
Findings
More sample-efficient than classical methods
Faster in terms of computation time
Validated on two domains with improved performance
Abstract
Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classical two-stage paradigm: first learn the environment dynamics and then plan accordingly. This approach, however, disconnects the two problems and can consequently lead to algorithms that are sample inefficient and time consuming. In this paper, we propose a novel algorithm that combines learning and planning together. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Gene Regulatory Network Analysis · Receptor Mechanisms and Signaling
