Eigensubspace of Temporal-Difference Dynamics and How It Improves Value   Approximation in Reinforcement Learning

Qiang He; Tianyi Zhou; Meng Fang; Setareh Maghsudi

arXiv:2306.16750·cs.LG·November 9, 2023

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

Qiang He, Tianyi Zhou, Meng Fang, Setareh Maghsudi

PDF

Open Access

TL;DR

This paper introduces ERC, a novel regularization method for deep reinforcement learning that leverages the eigensubspace of TD dynamics to improve value approximation stability and efficiency, with proven convergence and superior performance.

Contribution

ERC is the first method to utilize the 1-eigensubspace of the transition kernel for regularizing value approximation in deep RL, enhancing stability and reducing variance.

Findings

01

ERC outperforms state-of-the-art methods on 20 out of 26 DMControl tasks.

02

Theoretical proof of ERC's convergence.

03

ERC significantly reduces variance in value function estimates.

Abstract

We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundamental property of TD learning that has remained unused in previous deep RL approaches. In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation. Moreover, we theoretically prove the convergence of the ERC method. Besides, theoretical analysis and experiments demonstrate that ERC effectively reduces the variance of value functions. Among 26 tasks in the DMControl benchmark, ERC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics