World Craft: Agentic Framework to Create Visualizable Worlds via Text
Jianwen Sun, Yukang Feng, Kaining Ying, Chuanhao Li, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Yifan Chang, Yu Dai, Yifei Huang, Kaipeng Zhang

TL;DR
World Craft is a framework that enables users to create visualizable, interactive worlds like AI Town through natural language descriptions, combining structured scaffolding and multi-agent analysis to improve customization and stability.
Contribution
The paper introduces a novel agentic framework with modules for structured scene development and intent analysis, enhancing user-driven environment creation without programming skills.
Findings
Outperforms existing code agents and LLMs in scene construction
Enhances spatial knowledge and layout stability
Provides scalable democratization of environment creation
Abstract
Large Language Models (LLMs) motivate generative agent simulation (e.g., AI Town) to create a ``dynamic world'', holding immense value across entertainment and research. However, for non-experts, especially those without programming skills, it isn't easy to customize a visualizable environment by themselves. In this paper, we introduce World Craft, an agentic world creation framework to create an executable and visualizable AI Town via user textual descriptions. It consists of two main modules, World Scaffold and World Guild. World Scaffold is a structured and concise standardization to develop interactive game scenes, serving as an efficient scaffolding for LLMs to customize an executable AI Town-like environment. World Guild is a multi-agent framework to progressively analyze users' intents from rough descriptions, and synthesizes required structured contents (\eg environment layout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
