Measuring Sample Efficiency and Generalization in Reinforcement Learning   Benchmarks: NeurIPS 2020 Procgen Benchmark

Sharada Mohanty; Jyotish Poonganam; Adrien Gaidon; Andrey Kolobov,; Blake Wulfe; Dipam Chakraborty; Gra\v{z}vydas \v{S}emetulskis; Jo\~ao; Schapke; Jonas Kubilius; Jurgis Pa\v{s}ukonis; Linas Klimas; Matthew; Hausknecht; Patrick MacAlpine; Quang Nhat Tran; Thomas Tumiel; Xiaocheng; Tang; Xinwei Chen; Christopher Hesse; Jacob Hilton; William Hebgen Guss,; Sahika Genc; John Schulman; Karl Cobbe

arXiv:2103.15332·cs.LG·March 30, 2021·6 cites

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

Sharada Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov,, Blake Wulfe, Dipam Chakraborty, Gra\v{z}vydas \v{S}emetulskis, Jo\~ao, Schapke, Jonas Kubilius, Jurgis Pa\v{s}ukonis, Linas Klimas, Matthew, Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel

PDF

Open Access

TL;DR

This paper introduces a centralized benchmark for measuring sample efficiency and generalization in reinforcement learning, utilizing the Procgen environment to evaluate diverse algorithms in a scalable, standardized manner.

Contribution

It designs a scalable, standardized benchmark for assessing sample efficiency and generalization in reinforcement learning, facilitating progress measurement and comparison.

Findings

01

Top solutions demonstrated improved generalization capabilities.

02

Benchmark setup enabled comprehensive evaluation of diverse algorithms.

03

Analysis provided insights into strengths and weaknesses of competing methods.

Abstract

The NeurIPS 2020 Procgen Competition was designed as a centralized benchmark with clearly defined tasks for measuring Sample Efficiency and Generalization in Reinforcement Learning. Generalization remains one of the most fundamental challenges in deep reinforcement learning, and yet we do not have enough benchmarks to measure the progress of the community on Generalization in Reinforcement Learning. We present the design of a centralized benchmark for Reinforcement Learning which can help measure Sample Efficiency and Generalization in Reinforcement Learning by doing end to end evaluation of the training and rollout phases of thousands of user submitted code bases in a scalable way. We designed the benchmark on top of the already existing Procgen Benchmark by defining clear tasks and standardizing the end to end evaluation setups. The design aims to maximize the flexibility available…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics