Identifying Policy Gradient Subspaces
Jan Schneider, Pierre Schumacher, Simon Guist, Le Chen, Daniel, H\"aufle, Bernhard Sch\"olkopf, Dieter B\"uchler

TL;DR
This paper investigates the existence of low-dimensional, slowly-changing gradient subspaces in policy gradient methods, revealing opportunities to enhance reinforcement learning efficiency through better exploration and optimization strategies.
Contribution
It provides a comprehensive evaluation of gradient subspaces in deep policy gradient methods across multiple benchmarks, confirming their presence despite dynamic data distributions.
Findings
Gradient subspaces exist in policy gradient methods.
Subspaces are low-dimensional and change slowly over time.
Potential for improved exploration and second-order optimization.
Abstract
Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised learning can be accelerated by leveraging the fact that gradients lie in a low-dimensional and slowly-changing subspace. In this paper, we conduct a thorough evaluation of this phenomenon for two popular deep policy gradient methods on various simulated benchmark tasks. Our results demonstrate the existence of such gradient subspaces despite the continuously changing data distribution inherent to reinforcement learning. These findings reveal promising directions for future work on more efficient reinforcement learning, e.g., through improving parameter-space exploration or enabling second-order optimization.
Peer Reviews
Decision·ICLR 2024 poster
**Originality and significance:** The paper (empirically) shows that for the first time that policy gradients (and the critic gradients) of the widely popular deep RL methods (PPO and SAC) lie in a very tiny subspace and this subspace remains somewhat stable during the course of learning. This result is interesting in itself as it gives us additional insights into the deep RL methods. Further, such a result can have significant implications on developing policy gradient methods, as outlined by t
# Major issues (these affect my score significantly): The major weakness of the paper is a lack of rigor and concrete results. ## 1. There is a very weak link between the experimental results and the claims made in the paper: ## (a) For example, see the following claims of the paper: - Section 1 (last paragraph): "(i) parameter-space directions with significantly larger curvature exist in PG": --> why can we say that the curvature is significantly larger? Figure 1 is highly qualitative in na
Current literature on identifying gradient subspace focus on supervised learning, and related work in RL focus on identifying parameter subspace rather than gradient subspace. Therefore, this work is the first to identify gradient subspace in the context of RL and is informative for training RL algorithms. Project codes are provided for reproducibility.
It is unclear to which extent the contribution of identifying gradient subspace comparing to existing works in RL that focus on identifying parameter subspace (e.g. Gaya et al 2023 in the references) is significant, since both approaches have the same goal of improving training efficiency of policy parameters.
- The paper is clearly presented and easy-to-follow, the motivation and the methodology are clearly described. - The work has demonstrated that the subspace exists in policy gradient methods as well, similar to what people have discovered in the supervised learning setup. This is done by relatively comprehensive experiments including different approaches to estimate the policy gradient and Hessian matrix and consider both the policy model and the critic model. - The authors also discuss how to l
One of the perspectives to make the paper stronger and more convincing is to show the unique conclusion and domain-specific insight for policy gradient learning, since most of the conclusions actually come from the gradient subspace paper under a supervised learning setup.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
