Efficient Anti-exploration via VQVAE and Fuzzy Clustering in Offline Reinforcement Learning
Long Chen, Yinkui Liu, Shen Li, Bo Tang, Xuemin Hu

TL;DR
This paper introduces a novel anti-exploration method for offline reinforcement learning using VQVAE and fuzzy clustering, effectively addressing discretization issues and improving learning efficiency and performance.
Contribution
It proposes a multi-codebook VQVAE for discretizing state-action pairs and a fuzzy C-means based codebook update mechanism, enhancing efficiency and reducing information loss.
Findings
Outperforms state-of-the-art methods on D4RL benchmarks.
Requires less computational cost.
Improves policy learning in complex tasks.
Abstract
Pseudo-count is an effective anti-exploration method in offline reinforcement learning (RL) by counting state-action pairs and imposing a large penalty on rare or unseen state-action pair data. Existing anti-exploration methods count continuous state-action pairs by discretizing these data, but often suffer from the issues of dimension disaster and information loss in the discretization process, leading to efficiency and performance reduction, and even failure of policy learning. In this paper, a novel anti-exploration method based on Vector Quantized Variational Autoencoder (VQVAE) and fuzzy clustering in offline RL is proposed. We first propose an efficient pseudo-count method based on the multi-codebook VQVAE to discretize state-action pairs, and design an offline RL anti-exploitation method based on the proposed pseudo-count method to handle the dimension disaster issue and improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
