Tensor Low-rank Approximation of Finite-horizon Value Functions
Sergio Rozada, Antonio G. Marques

TL;DR
This paper introduces a low-rank tensor approximation method using PARAFAC decomposition to efficiently estimate finite-horizon value functions in reinforcement learning, addressing the challenge of growing VFs over time.
Contribution
It proposes a novel non-parametric, online low-rank tensor algorithm for approximating finite-horizon value functions in MDPs, leveraging tensor decomposition techniques.
Findings
Efficient approximation of finite-horizon VFs demonstrated through numerical experiments.
The low-rank model scales additively with dimensions, improving computational efficiency.
The method accurately recovers value functions from sampled rewards.
Abstract
The goal of reinforcement learning is estimating a policy that maps states to actions and maximizes the cumulative reward of a Markov Decision Process (MDP). This is oftentimes achieved by estimating first the optimal (reward) value function (VF) associated with each state-action pair. When the MDP has an infinite horizon, the optimal VFs and policies are stationary under mild conditions. However, in finite-horizon MDPs, the VFs (hence, the policies) vary with time. This poses a challenge since the number of VFs to estimate grows not only with the size of the state-action space but also with the time horizon. This paper proposes a non-parametric low-rank stochastic algorithm to approximate the VFs of finite-horizon MDPs. First, we represent the (unknown) VFs as a multi-dimensional array, or tensor, where time is one of the dimensions. Then, we use rewards sampled from the MDP to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Sparse and Compressive Sensing Techniques
