World Craft: Agentic Framework to Create Visualizable Worlds via Text

Jianwen Sun; Yukang Feng; Kaining Ying; Chuanhao Li; Zizhen Li; Fanrui Zhang; Jiaxin Ai; Yifan Chang; Yu Dai; Yifei Huang; Kaipeng Zhang

arXiv:2601.09150·cs.HC·January 30, 2026

World Craft: Agentic Framework to Create Visualizable Worlds via Text

Jianwen Sun, Yukang Feng, Kaining Ying, Chuanhao Li, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Yifan Chang, Yu Dai, Yifei Huang, Kaipeng Zhang

PDF

Open Access

TL;DR

World Craft is a framework that enables users to create visualizable, interactive worlds like AI Town through natural language descriptions, combining structured scaffolding and multi-agent analysis to improve customization and stability.

Contribution

The paper introduces a novel agentic framework with modules for structured scene development and intent analysis, enhancing user-driven environment creation without programming skills.

Findings

01

Outperforms existing code agents and LLMs in scene construction

02

Enhances spatial knowledge and layout stability

03

Provides scalable democratization of environment creation

Abstract

Large Language Models (LLMs) motivate generative agent simulation (e.g., AI Town) to create a ``dynamic world'', holding immense value across entertainment and research. However, for non-experts, especially those without programming skills, it isn't easy to customize a visualizable environment by themselves. In this paper, we introduce World Craft, an agentic world creation framework to create an executable and visualizable AI Town via user textual descriptions. It consists of two main modules, World Scaffold and World Guild. World Scaffold is a structured and concise standardization to develop interactive game scenes, serving as an efficient scaffolding for LLMs to customize an executable AI Town-like environment. World Guild is a multi-agent framework to progressively analyze users' intents from rough descriptions, and synthesizes required structured contents (\eg environment layout…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis