Matrix Low-Rank Approximation For Policy Gradient Methods
Sergio Rozada, Antonio G. Marques

TL;DR
This paper introduces low-rank matrix models for policy gradient methods in reinforcement learning, offering an efficient alternative to neural networks by reducing computational and sample complexities while maintaining performance.
Contribution
It proposes a novel low-rank matrix approach to estimate policy parameters, addressing challenges of neural network architectures and convergence in policy gradient methods.
Findings
Low-rank matrix models reduce computational complexity.
Sample efficiency improves with the matrix approach.
Performance comparable to neural network-based policies.
Abstract
Estimating a policy that maps states to actions is a central problem in reinforcement learning. Traditionally, policies are inferred from the so called value functions (VFs), but exact VF computation suffers from the curse of dimensionality. Policy gradient (PG) methods bypass this by learning directly a parametric stochastic policy. Typically, the parameters of the policy are estimated using neural networks (NNs) tuned via stochastic gradient descent. However, finding adequate NN architectures can be challenging, and convergence issues are common as well. In this paper, we put forth low-rank matrix-based models to estimate efficiently the parameters of PG algorithms. We collect the parameters of the stochastic policy into a matrix, and then, we leverage matrix-completion techniques to promote (enforce) low rank. We demonstrate via numerical studies how low-rank matrix-based policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Age of Information Optimization
