SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks

Zheng Chun

arXiv:2501.03676·cs.LG·January 14, 2025

SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks

Zheng Chun

PDF

Open Access

TL;DR

This paper introduces a novel offline reinforcement learning algorithm that combines ensemble Q-networks, gradient diversity penalties, and behavior cloning to improve stability, convergence speed, and performance on benchmark tasks.

Contribution

It presents a new model-free actor-critic method integrating ensemble Q-networks with penalties and behavior cloning, enhancing out-of-distribution action handling and training stability.

Findings

01

Achieves higher convergence speed and stability.

02

Outperforms existing methods on D4RL MuJoCo benchmarks.

03

Effectively suppresses overestimation of out-of-distribution actions.

Abstract

In this work, we build upon the offline reinforcement learning algorithm TD7, which incorporates State-Action Learned Embeddings (SALE) and a prioritized experience replay buffer (LAP). We propose a model-free actor-critic algorithm that integrates ensemble Q-networks and a gradient diversity penalty from EDAC. The ensemble Q-networks introduce penalties to guide the actor network toward in-distribution actions, effectively addressing the challenge of out-of-distribution actions. Meanwhile, the gradient diversity penalty encourages diverse Q-value gradients, further suppressing overestimation for out-of-distribution actions. Additionally, our method retains an adjustable behavior cloning (BC) term that directs the actor network toward dataset actions during early training stages, while gradually reducing its influence as the precision of the Q-ensemble improves. These enhancements work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control · Neural Networks and Applications · Reinforcement Learning in Robotics

MethodsExperience Replay · Prioritized Experience Replay · Focus