Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate
Fan-Ming Luo, Zuolin Tu, Zefang Huang, Yang Yu

TL;DR
This paper introduces RESeL, a novel recurrent off-policy reinforcement learning method that employs a context-encoder-specific learning rate to improve training stability and performance across various POMDP and MDP tasks.
Contribution
The paper proposes a new technique of using a lower learning rate for the context encoder in recurrent RL, enhancing stability and performance, and integrates it into existing algorithms.
Findings
RESeL significantly improves training stability in POMDP tasks.
RESeL outperforms previous recurrent RL baselines in POMDP scenarios.
RESeL is competitive with or surpasses state-of-the-art methods in MDP tasks.
Abstract
Real-world decision-making tasks are usually partially observable Markov decision processes (POMDPs), where the state is not fully observable. Recent progress has demonstrated that recurrent reinforcement learning (RL), which consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction and a multilayer perceptron (MLP) policy for decision making, can mitigate partial observability and serve as a robust baseline for POMDP tasks. However, previous recurrent RL methods face training stability issues due to the gradient instability of RNNs. In this paper, we propose Recurrent Off-policy RL with Context-Encoder-Specific Learning Rate (RESeL) to tackle this issue. Specifically, RESeL uses a lower learning rate for context encoder than other MLP layers to ensure the stability of the former while maintaining the training efficiency of the latter. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnalog and Mixed-Signal Circuit Design · Advanced Adaptive Filtering Techniques · IoT-based Smart Home Systems
