Interpretable performance analysis towards offline reinforcement learning: A dataset perspective
Chenyang Xi, Bo Tang, Jiajun Shen, Xinfu Liu, Feiyu Xiong, Xueying Li

TL;DR
This paper introduces a new dataset perspective for analyzing offline reinforcement learning, proposing a taxonomy, theoretical bounds, a modified algorithm, and a comprehensive benchmark platform to advance the field.
Contribution
It offers a novel taxonomy, theoretical insights into extrapolation error, an improved algorithm, and an open-source benchmark platform for offline RL research.
Findings
BCQ outperforms other techniques under certain conditions
The modified top return selection improves performance on low-return datasets
The RLEG benchmark enables fair comparison of offline RL algorithms
Abstract
Offline reinforcement learning (RL) has increasingly become the focus of the artificial intelligent research due to its wide real-world applications where the collection of data may be difficult, time-consuming, or costly. In this paper, we first propose a two-fold taxonomy for existing offline RL algorithms from the perspective of exploration and exploitation tendency. Secondly, we derive the explicit expression of the upper bound of extrapolation error and explore the correlation between the performance of different types of algorithms and the distribution of actions under states. Specifically, we relax the strict assumption on the sufficiently large amount of state-action tuples. Accordingly, we provably explain why batch constrained Q-learning (BCQ) performs better than other existing techniques. Thirdly, after identifying the weakness of BCQ on dataset of low mean episode returns,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Data Stream Mining Techniques
MethodsQ-Learning
