Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific   Learning Rate

Fan-Ming Luo; Zuolin Tu; Zefang Huang; Yang Yu

arXiv:2405.15384·cs.LG·May 27, 2024

Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate

Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces RESeL, a novel recurrent off-policy reinforcement learning method that employs a context-encoder-specific learning rate to improve training stability and performance across various POMDP and MDP tasks.

Contribution

The paper proposes a new technique of using a lower learning rate for the context encoder in recurrent RL, enhancing stability and performance, and integrates it into existing algorithms.

Findings

01

RESeL significantly improves training stability in POMDP tasks.

02

RESeL outperforms previous recurrent RL baselines in POMDP scenarios.

03

RESeL is competitive with or surpasses state-of-the-art methods in MDP tasks.

Abstract

Real-world decision-making tasks are usually partially observable Markov decision processes (POMDPs), where the state is not fully observable. Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks. However, previous recurrent RL methods face training stability issues due to the gradient instability of RNNs. In this paper, we propose Recurrent Off-policy RL with Context-Encoder-Specific Learning Rate (RESeL) to tackle this issue. Specifically, RESeL uses a lower learning rate for context encoder than other MLP layers to ensure the stability of the former while maintaining the training efficiency of the latter. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FanmingL/Recurrent-Offpolicy-RL
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnalog and Mixed-Signal Circuit Design · Advanced Adaptive Filtering Techniques · IoT-based Smart Home Systems