Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Weihao Tan; Xiangyang Li; Yunhao Fang; Heyuan Yao; Shi Yan; Hao Luo; Tenglong Ao; Huihui Li; Hongbin Ren; Bairen Yi; Yujia Qin; Bo An; Libin Liu; Guang Shi

arXiv:2511.08892·cs.AI·November 13, 2025

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Weihao Tan, Xiangyang Li, Yunhao Fang, Heyuan Yao, Shi Yan, Hao Luo, Tenglong Ao, Huihui Li, Hongbin Ren, Bairen Yi, Yujia Qin, Bo An, Libin Liu, Guang Shi

PDF

Open Access 1 Datasets

TL;DR

Lumine is a pioneering open recipe for creating generalist agents capable of performing complex, hours-long tasks in 3D open worlds using a unified perception, reasoning, and action framework powered by vision-language models.

Contribution

It introduces Lumine, the first comprehensive approach to develop generalist agents in 3D open worlds, demonstrating strong zero-shot cross-game generalization without fine-tuning.

Findings

01

Successfully completes five-hour main storyline in Genshin Impact

02

Achieves zero-shot performance in Wuthering Waves and Honkai: Star Rail

03

Operates in real-time with adaptive reasoning and multi-modal tasks

Abstract

We introduce Lumine, the first open recipe for developing generalist agents capable of completing hours-long complex missions in real time within challenging 3D open-world environments. Lumine adopts a human-like interaction paradigm that unifies perception, reasoning, and action in an end-to-end manner, powered by a vision-language model. It processes raw pixels at 5 Hz to produce precise 30 Hz keyboard-mouse actions and adaptively invokes reasoning only when necessary. Trained in Genshin Impact, Lumine successfully completes the entire five-hour Mondstadt main storyline on par with human-level efficiency and follows natural language instructions to perform a broad spectrum of tasks in both 3D open-world exploration and 2D GUI manipulation across collection, combat, puzzle-solving, and NPC interaction. In addition to its in-domain performance, Lumine demonstrates strong zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

TESS-Computer/minecraft-vla-stage3
dataset· 26 dl
26 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning