Efficient decorrelation of features using Gramian in Reinforcement Learning
Borislav Mavrin, Daniel Graves, Alan Chan

TL;DR
This paper introduces an online regularization method using Gramian matrices to decorrelate features in reinforcement learning, improving sample efficiency across various environments.
Contribution
It develops a scalable, theoretically grounded decorrelation technique for RL that preserves the main reward maximization objective.
Findings
Significant sample efficiency improvements in 40 out of 49 Atari games.
The method converges in linear function approximation settings.
Scales linearly with features and quadratically with batch size.
Abstract
Learning good representations is a long standing problem in reinforcement learning (RL). One of the conventional ways to achieve this goal in the supervised setting is through regularization of the parameters. Extending some of these ideas to the RL setting has not yielded similar improvements in learning. In this paper, we develop an online regularization framework for decorrelating features in RL and demonstrate its utility in several test environments. We prove that the proposed algorithm converges in the linear function approximation setting and does not change the main objective of maximizing cumulative reward. We demonstrate how to scale the approach to deep RL using the Gramian of the features achieving linear computational complexity in the number of features and squared complexity in size of the batch. We conduct an extensive empirical study of the new approach on Atari 2600…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Model Reduction and Neural Networks
MethodsTest
