Matrix Low-Rank Trust Region Policy Optimization
Sergio Rozada, Antonio G. Marques

TL;DR
This paper proposes a low-rank matrix approach to improve Trust Region Policy Optimization in reinforcement learning, reducing computational costs while maintaining performance.
Contribution
It introduces a novel low-rank matrix model for TRPO, enhancing efficiency over neural network policies in reinforcement learning.
Findings
Low-rank matrix models reduce computational complexity.
Sample efficiency is improved with the new approach.
Performance remains comparable to neural network policies.
Abstract
Most methods in reinforcement learning use a Policy Gradient (PG) approach to learn a parametric stochastic policy that maps states to actions. The standard approach is to implement such a mapping via a neural network (NN) whose parameters are optimized using stochastic gradient descent. However, PG methods are prone to large policy updates that can render learning inefficient. Trust region algorithms, like Trust Region Policy Optimization (TRPO), constrain the policy update step, ensuring monotonic improvements. This paper introduces low-rank matrix-based models as an efficient alternative for estimating the parameters of TRPO algorithms. By gathering the stochastic policy's parameters into a matrix and applying matrix-completion techniques, we promote and enforce low rank. Our numerical studies demonstrate that low-rank matrix-based policy models effectively reduce both computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Cryptography and Data Security · Advanced Memory and Neural Computing
MethodsTrust Region Policy Optimization
