Addressing Finite-Horizon MDPs via Low-Rank Tensor Value Approximation
Sergio Rozada, Jose Luis Orejuela, Antonio G. Marques

TL;DR
This paper introduces a low-rank tensor approximation approach for efficiently learning optimal policies in finite-horizon MDPs, addressing high-dimensionality and non-stationarity challenges.
Contribution
It proposes a scalable low-rank tensor framework for policy evaluation and improvement, with algorithms that have theoretical convergence guarantees and applicability to unknown dynamics.
Findings
Reduces computational complexity in synthetic and resource allocation scenarios.
Achieves competitive policy performance with bounded approximation errors.
Provides convergence guarantees for low-rank Bellman equation solvers.
Abstract
We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary. This aggravates the challenges of high-dimensional MDPs, as they suffer from the curse of dimensionality and high sample complexity. To address these issues, we propose modeling the VFs of finite-horizon MDPs as low-rank tensors, enabling a scalable representation that renders the problem of learning optimal policies tractable. Our approach focuses on VF approximation within a policy iteration framework, where low-rank policy evaluation is combined with greedy policy improvement to compute near-optimal policies. We introduce an optimization-based framework for solving the Bellman equations with low-rank constraints, along with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
