Dungeons and Data: A Large-Scale NetHack Dataset
Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim, Rockt\"aschel, Heinrich K\"uttler, Naila Murray

TL;DR
This paper introduces the NetHack Learning Dataset (NLD), a large-scale, scalable dataset of game trajectories designed to advance research in challenging sequential decision-making tasks, addressing data scarcity and computational challenges.
Contribution
The paper presents the NLD, a comprehensive dataset with 10 billion state transitions, and provides tools for data handling, enabling new research in reinforcement learning and decision making.
Findings
Existing algorithms struggle to fully leverage the dataset.
Significant research advances are needed for large-scale sequential decision tasks.
The dataset enables benchmarking and development of new RL methods.
Abstract
Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Data Visualization and Analytics · Data Stream Mining Techniques
