Identifying Policy Gradient Subspaces

Jan Schneider; Pierre Schumacher; Simon Guist; Le Chen; Daniel; H\"aufle; Bernhard Sch\"olkopf; Dieter B\"uchler

arXiv:2401.06604·cs.LG·March 19, 2024·1 cites

Identifying Policy Gradient Subspaces

Jan Schneider, Pierre Schumacher, Simon Guist, Le Chen, Daniel, H\"aufle, Bernhard Sch\"olkopf, Dieter B\"uchler

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper investigates the existence of low-dimensional, slowly-changing gradient subspaces in policy gradient methods, revealing opportunities to enhance reinforcement learning efficiency through better exploration and optimization strategies.

Contribution

It provides a comprehensive evaluation of gradient subspaces in deep policy gradient methods across multiple benchmarks, confirming their presence despite dynamic data distributions.

Findings

01

Gradient subspaces exist in policy gradient methods.

02

Subspaces are low-dimensional and change slowly over time.

03

Potential for improved exploration and second-order optimization.

Abstract

Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised learning can be accelerated by leveraging the fact that gradients lie in a low-dimensional and slowly-changing subspace. In this paper, we conduct a thorough evaluation of this phenomenon for two popular deep policy gradient methods on various simulated benchmark tasks. Our results demonstrate the existence of such gradient subspaces despite the continuously changing data distribution inherent to reinforcement learning. These findings reveal promising directions for future work on more efficient reinforcement learning, e.g., through improving parameter-space exploration or enabling second-order optimization.

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

**Originality and significance:** The paper (empirically) shows that for the first time that policy gradients (and the critic gradients) of the widely popular deep RL methods (PPO and SAC) lie in a very tiny subspace and this subspace remains somewhat stable during the course of learning. This result is interesting in itself as it gives us additional insights into the deep RL methods. Further, such a result can have significant implications on developing policy gradient methods, as outlined by t

Weaknesses

# Major issues (these affect my score significantly): The major weakness of the paper is a lack of rigor and concrete results. ## 1. There is a very weak link between the experimental results and the claims made in the paper: ## (a) For example, see the following claims of the paper: - Section 1 (last paragraph): "(i) parameter-space directions with significantly larger curvature exist in PG": --> why can we say that the curvature is significantly larger? Figure 1 is highly qualitative in na

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

Current literature on identifying gradient subspace focus on supervised learning, and related work in RL focus on identifying parameter subspace rather than gradient subspace. Therefore, this work is the first to identify gradient subspace in the context of RL and is informative for training RL algorithms. Project codes are provided for reproducibility.

Weaknesses

It is unclear to which extent the contribution of identifying gradient subspace comparing to existing works in RL that focus on identifying parameter subspace (e.g. Gaya et al 2023 in the references) is significant, since both approaches have the same goal of improving training efficiency of policy parameters.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- The paper is clearly presented and easy-to-follow, the motivation and the methodology are clearly described. - The work has demonstrated that the subspace exists in policy gradient methods as well, similar to what people have discovered in the supervised learning setup. This is done by relatively comprehensive experiments including different approaches to estimate the policy gradient and Hessian matrix and consider both the policy model and the critic model. - The authors also discuss how to l

Weaknesses

One of the perspectives to make the paper stronger and more convincing is to show the unique conclusion and domain-specific insight for policy gradient learning, since most of the conclusions actually come from the gradient subspace paper under a supervised learning setup.

Videos

Identifying Policy Gradient Subspaces· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning