Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes
Tim Merino, Sam Earle, Ryunosuke Iwai, Julian Togelius, Edoardo Cetin

TL;DR
Dream-Cubed introduces a large-scale Minecraft voxel dataset and models that generate interactive 3D worlds efficiently, enabling semantic, controllable, and user-interactive environment creation.
Contribution
It presents the first large-scale study of 3D diffusion models for voxel generation using a new extensive Minecraft dataset and architectural analysis.
Findings
Models operate directly in block space for efficiency and semantics.
Quantitative evaluation uses adapted FID and human preference studies.
Released dataset, code, and pretrained models for future research.
Abstract
We introduce Dream-Cubed, a large-scale dataset of Minecraft worlds at voxel resolution, and a family of models using cubes as powerful compositional units for efficient generation of interactive 3D environments. Dream-Cubed comprises tens of billions of tokens from a carefully curated mixture of procedural biome terrain and high-quality human-authored maps. We use this dataset to conduct the first large-scale study of 3D diffusion models for voxel generation, analyzing discrete and continuous diffusion formulations, data compositions, and architectural design choices. Our models operate directly in the space of blocks, enabling efficient and semantically grounded generation while supporting interactive user workflows such as inpainting and outpainting from user-authored blocks. To quantitatively evaluate our models, we adapt the FID metric to assess semantic differences between real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
