Benchmarking the Spectrum of Agent Capabilities
Danijar Hafner

TL;DR
This paper introduces Crafter, a comprehensive benchmark environment in an open world survival game that evaluates diverse agent capabilities through meaningful achievements, promoting research on generalization, exploration, and reasoning.
Contribution
The paper presents Crafter, a novel unified benchmark environment for assessing a wide range of agent abilities in a single setting, reducing the need for multiple specialized benchmarks.
Findings
Crafter effectively evaluates general abilities in a complex environment.
Reward-maximizing agents exhibit sophisticated behaviors like building and resource management.
Baseline scores demonstrate the benchmark's suitability for future research.
Abstract
Evaluating the general abilities of intelligent agents requires complex simulation environments. Existing benchmarks typically evaluate only one narrow task per environment, requiring researchers to perform expensive training runs on many different environments. We introduce Crafter, an open world survival game with visual inputs that evaluates a wide range of general abilities within a single environment. Agents either learn from the provided reward signal or through intrinsic objectives and are evaluated by semantically meaningful achievements that can be unlocked during each episode, such as discovering resources and crafting tools. Consistently unlocking all achievements requires strong generalization, deep exploration, and long-term reasoning. We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Games
