STARDATA: A StarCraft AI Research Dataset
Zeming Lin, Jonas Gehring, Vasil Khalidov, Gabriel Synnaeve

TL;DR
STARDATA is a comprehensive, high-quality dataset of over 65,000 StarCraft replays with detailed game states, enabling diverse machine learning research in strategy, modeling, and AI development.
Contribution
This paper introduces a large, standardized, and accessible dataset of StarCraft replays with full game state data, facilitating research in various AI and machine learning tasks.
Findings
Dataset contains 65,646 replays with 1535 million frames.
Data quality and diversity are ensured by heuristics.
Examples demonstrate the dataset's utility in multiple tasks.
Abstract
We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was recorded every 3 frames which ensures suitability for a wide variety of machine learning tasks such as strategy classification, inverse reinforcement learning, imitation learning, forward modeling, partial information extraction, and others. We use TorchCraft to extract and store the data, which standardizes the data format for both reading from replays and reading directly from the game. Furthermore, the data can be used on different operating systems and platforms. The dataset contains valid, non-corrupted replays only and its quality and diversity was ensured by a number of heuristics. We illustrate the diversity of the data with various statistics and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Digital Games and Media · Reinforcement Learning in Robotics
