Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation
Lanshan He, Haozhou Pang, Qi Gan, Xin Shen, Ziwei Zhang, Yibo Liu, Gang Fang, Bo Liu, Kai Sheng, Shengfeng Zeng, Chaofan Li, Zhen Hui, Keer Zhou, Lan Zhou, Shujun Dai

TL;DR
This paper introduces Cutscene Agent, an LLM-based framework for automated 3D cutscene generation that integrates real-time game engine feedback, orchestrates multiple specialized agents, and provides a new benchmark for evaluation.
Contribution
It presents a novel multi-agent system with bidirectional engine integration and a hierarchical benchmark for complex cutscene generation tasks.
Findings
LLMs can generate coherent cutscenes with real-time scene observation.
The framework enables end-to-end automation of cinematic content creation.
CutsceneBench effectively evaluates multi-step, interdependent tool use in cutscene generation.
Abstract
Cutscenes are carefully choreographed cinematic sequences embedded in video games and interactive media, serving as the primary vehicle for narrative delivery, character development, and emotional engagement. Producing cutscenes is inherently complex: it demands seamless coordination across screenwriting, cinematography, character animation, voice acting, and technical direction, often requiring days to weeks of collaborative effort from multidisciplinary teams to produce minutes of polished content. In this work, we present Cutscene Agent, an LLM agent framework for automated end-to-end cutscene generation. The framework makes three contributions: (1)~a Cutscene Toolkit built on the Model Context Protocol (MCP) that establishes \emph{bidirectional} integration between LLM agents and the game engine -- agents not only invoke engine operations but continuously observe real-time scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
