Matrix Low-Rank Trust Region Policy Optimization

Sergio Rozada; Antonio G. Marques

arXiv:2405.17625·cs.LG·May 29, 2024

Matrix Low-Rank Trust Region Policy Optimization

Sergio Rozada, Antonio G. Marques

PDF

Open Access 1 Repo

TL;DR

This paper proposes a low-rank matrix approach to improve Trust Region Policy Optimization in reinforcement learning, reducing computational costs while maintaining performance.

Contribution

It introduces a novel low-rank matrix model for TRPO, enhancing efficiency over neural network policies in reinforcement learning.

Findings

01

Low-rank matrix models reduce computational complexity.

02

Sample efficiency is improved with the new approach.

03

Performance remains comparable to neural network policies.

Abstract

Most methods in reinforcement learning use a Policy Gradient (PG) approach to learn a parametric stochastic policy that maps states to actions. The standard approach is to implement such a mapping via a neural network (NN) whose parameters are optimized using stochastic gradient descent. However, PG methods are prone to large policy updates that can render learning inefficient. Trust region algorithms, like Trust Region Policy Optimization (TRPO), constrain the policy update step, ensuring monotonic improvements. This paper introduces low-rank matrix-based models as an efficient alternative for estimating the parameters of TRPO algorithms. By gathering the stochastic policy's parameters into a matrix and applying matrix-completion techniques, we promote and enforce low rank. Our numerical studies demonstrate that low-rank matrix-based policy models effectively reduce both computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sergiorozada12/matrix-low-rank-trpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Cryptography and Data Security · Advanced Memory and Neural Computing

MethodsTrust Region Policy Optimization