Pixels to Play: A Foundation Model for 3D Gameplay

Yuguang Yue; Chris Green; Samuel Hunt; Irakli Salia; Wenzhe Shi; Jonathan J Hunt

arXiv:2508.14295·cs.CV·August 21, 2025

Pixels to Play: A Foundation Model for 3D Gameplay

Yuguang Yue, Chris Green, Samuel Hunt, Irakli Salia, Wenzhe Shi, Jonathan J Hunt

PDF

Open Access

TL;DR

Pixels2Play-0.1 is a foundation model that learns to play various 3D video games using pixel streams, combining behavior cloning from demonstrations and unlabeled videos, aiming for generalization with minimal game-specific tuning.

Contribution

The paper introduces Pixels2Play-0.1, a novel end-to-end transformer-based model that learns to play multiple 3D games from pixel data using combined supervised and unsupervised learning methods.

Findings

01

Competent play on Roblox and MS-DOS titles

02

Effective use of unlabeled videos for training

03

Potential for reaching expert-level control with further scaling

Abstract

We introduce Pixels2Play-0.1 (P2P0.1), a foundation model that learns to play a wide range of 3D video games with recognizable human-like behavior. Motivated by emerging consumer and developer use cases - AI teammates, controllable NPCs, personalized live-streamers, assistive testers - we argue that an agent must rely on the same pixel stream available to players and generalize to new titles with minimal game-specific engineering. P2P0.1 is trained end-to-end with behavior cloning: labeled demonstrations collected from instrumented human game-play are complemented by unlabeled public videos, to which we impute actions via an inverse-dynamics model. A decoder-only transformer with auto-regressive action output handles the large action space while remaining latency-friendly on a single consumer GPU. We report qualitative results showing competent play across simple Roblox and classic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Games and Media · Augmented Reality Applications