Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing
Declan Oller, Tobias Glasmachers, Giuseppe Cuccu

TL;DR
This paper introduces a new method using random policy networks to objectively analyze and visualize the complexity of reinforcement learning benchmarks, revealing insights into environment difficulty and benchmark triviality.
Contribution
The study presents Random Weight Guessing as a learning-agnostic approach to evaluate RL benchmarks, providing a baseline and insights into environment complexity without training.
Findings
Random networks can serve as robust baselines.
Some benchmarks are trivial for untrained networks.
The method isolates environment complexity effectively.
Abstract
We propose a novel method for analyzing and visualizing the complexity of standard reinforcement learning (RL) benchmarks based on score distributions. A large number of policy networks are generated by randomly guessing their parameters, and then evaluated on the benchmark task; the study of their aggregated results provide insights into the benchmark complexity. Our method guarantees objectivity of evaluation by sidestepping learning altogether: the policy network parameters are generated using Random Weight Guessing (RWG), making our method agnostic to (i) the classic RL setup, (ii) any learning algorithm, and (iii) hyperparameter tuning. We show that this approach isolates the environment complexity, highlights specific types of challenges, and provides a proper foundation for the statistical analysis of the task's difficulty. We test our approach on a variety of classic control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Evolutionary Algorithms and Applications
