Reliable validation of Reinforcement Learning Benchmarks
Matthias M\"uller-Brockhausen, Aske Plaat, Mike Preuss

TL;DR
This paper introduces a method for validating reinforcement learning benchmarks by providing minimal traces that enable re-simulation and verification of experimental results, enhancing reproducibility and trustworthiness.
Contribution
It proposes a minimal trace approach for RL benchmarks, allowing easy validation, re-use, and inspection of results without extensive computation, integrated with existing RL tools.
Findings
Minimal traces enable re-simulation of action sequences.
The approach significantly reduces data size for storage and sharing.
Proof-of-concept results demonstrate effectiveness across various games.
Abstract
Reinforcement Learning (RL) is one of the most dynamic research areas in Game AI and AI as a whole, and a wide variety of games are used as its prominent test problems. However, it is subject to the replicability crisis that currently affects most algorithmic AI research. Benchmarking in Reinforcement Learning could be improved through verifiable results. There are numerous benchmark environments whose scores are used to compare different algorithms, such as Atari. Nevertheless, reviewers must trust that figures represent truthful values, as it is difficult to reproduce an exact training curve. We propose improving this situation by providing access to the original experimental data to validate study results. To that end, we rely on the concept of minimal traces. These allow re-simulation of action sequences in deterministic RL environments and, in turn, enable reviewers to verify,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications
