Diffusion Models Are Real-Time Game Engines

Dani Valevski; Yaniv Leviathan; Moab Arar; Shlomi Fruchter

arXiv:2408.14837·cs.LG·April 25, 2025·3 cites

Diffusion Models Are Real-Time Game Engines

Dani Valevski, Yaniv Leviathan, Moab Arar, Shlomi Fruchter

PDF

Open Access 3 Reviews

TL;DR

GameNGen is a neural-powered game engine capable of real-time, high-quality, long-trajectory interaction in complex environments, trained on DOOM, and capable of realistic simulation and prediction.

Contribution

This work introduces GameNGen, the first neural game engine that enables real-time interaction and long trajectory simulation using diffusion models trained on gameplay data.

Findings

01

Runs at 20 FPS on a TPU

02

Achieves PSNR of 29.4 for next frame prediction

03

Human raters struggle to distinguish real from generated clips

Abstract

We present GameNGen, the first game engine powered entirely by a neural model that also enables real-time interaction with a complex environment over long trajectories at high quality. When trained on the classic game DOOM, GameNGen extracts gameplay and uses it to generate a playable environment that can interactively simulate new trajectories. GameNGen runs at 20 frames per second on a single TPU and remains stable over extended multi-minute play sessions. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation, even after 5 minutes of auto-regressive generation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 3

Strengths

Real-Time Performance: The paper demonstrates a model that runs at 20 frames per second, achieving performance close to real-time gaming on a TPU, which shows its practical deployment potential in high-demand applications.

Weaknesses

Lack of Novelty: The application relies on pre-existing models, primarily a stable diffusion variant, with incremental architectural adjustments. While the use of diffusion models in gaming is somewhat novel, the approach is more of an adaptation than a breakthrough innovation in game simulation. Clarity: I found Section 2 difficult to understand. Could you please elaborate on the model inputs and clarify the regression objective? The mathematical symbols are a bit confusing—for example, could

Reviewer 02Rating 8Confidence 4

Strengths

- The paper demonstrates for the first time that one can make a neural network real-time simulate a relatively complex video game. The motivation, rapid text or image-programmable video game generation, is clear and convincing. I appreciate the amount of engineering that went into making this, which seemed far-fetched a year or two ago, happen. - The paper provides a plethora of metrics from PNSR, LPIPS, FVD, and human evaluations on model-generated image and video quality. - The paper provides

Weaknesses

- There is no methodological novelty to the paper, but given the remarkable findings this is not a problem. - The model and code are not available to the public, so we cannot assess how robust the model and generated gameplay is. Since this is a phenomenological paper, this is more important than it is for typical ML papers. - It is unclear how much of this amazing performance is due to "training data overfitting", and how well the model would perform on a sufficiently different DOOM map. The au

Reviewer 03Rating 5Confidence 3

Strengths

1. The first work focus on interactive playable real-time simulation, interesting idea. 2. Extensive experiments and broad ablation study shows the accurate prediction (at least visually) of the diffusion model and also efficiency of some design choice.

Weaknesses

I agree a neural simulator is an interesting idea, but it would be good to show more things: (just as what you mentioned as your future work) 1. Same method but generalized to more than one game, otherwise it might be suspicious that VisDoom has some aspect to be easy to learn (like the unchanged UI). 2. Shows how a neural game simulator can be useful for downstream tasks like using it to train an agent with faster speed or empower the agent with a great forward model to help decision making.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Artificial Intelligence in Games

MethodsDiffusion