Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies
Shivakanth Sujit, Pedro H. M. Braga, Jorg Bornschein, Samira Ebrahimi, Kahou

TL;DR
This paper introduces a sequential evaluation methodology for offline reinforcement learning algorithms, enabling better assessment of data efficiency and robustness across different datasets and tasks.
Contribution
It proposes a new evaluation protocol that measures offline RL performance as a function of training set size, unifying offline and online assessment methods.
Findings
Sequential evaluation reveals data efficiency differences among algorithms.
The approach provides insights into robustness to dataset distribution shifts.
Comparison across multiple tasks demonstrates the method's general applicability.
Abstract
Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive; such as in robotics. Offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data without needing to interact with the environment from the very beginning. While online RL algorithms are typically evaluated as a function of the number of environment interactions, there exists no single established protocol for evaluating offline RL methods.In this paper, we propose a sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research
