STAGE: A Full-Screenplay Benchmark for Reasoning over Evolving Storie
Qiuyu Tian, Zequn Liu, Yiding Li, Fengyi Chen, Zequn Liu, Youyong Kong, Fan Guo, Yuyao Li, Jinjing Shen, Zhijing Xie, Yiyun Luo, Xin Zhang, Yingce Xia

TL;DR
STAGE is a comprehensive benchmark for evaluating models' ability to understand, reason over, and generate coherent narratives from full-length movie screenplays across multiple tasks.
Contribution
It introduces a unified benchmark with four interconnected tasks, curated datasets, and annotations for holistic narrative understanding in both English and Chinese.
Findings
Provides datasets for 150 films in English and Chinese.
Enables evaluation of models' world-building and reasoning capabilities.
Supports multiple tasks including graph construction and character role-playing.
Abstract
Movie screenplays are rich long-form narratives that interleave complex character relationships, temporally ordered events, and dialogue-driven interactions. While prior benchmarks target individual subtasks such as question answering or dialogue generation, they rarely evaluate whether models can construct a coherent story world and use it consistently across multiple forms of reasoning and generation. We introduce STAGE (Screenplay Text, Agents, Graphs and Evaluation), a unified benchmark for narrative understanding over full-length movie screenplays. STAGE defines four tasks: knowledge graph construction, scene-level event summarization, long-context screenplay question answering, and in-script character role-playing, all grounded in a shared narrative world representation. The benchmark provides cleaned scripts, curated knowledge graphs, and event- and character-centric annotations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
