STAGE: A Full-Screenplay Benchmark for Reasoning over Evolving Storie

Qiuyu Tian; Zequn Liu; Yiding Li; Fengyi Chen; Zequn Liu; Youyong Kong; Fan Guo; Yuyao Li; Jinjing Shen; Zhijing Xie; Yiyun Luo; Xin Zhang; Yingce Xia

arXiv:2601.08510·cs.CL·May 20, 2026

STAGE: A Full-Screenplay Benchmark for Reasoning over Evolving Storie

Qiuyu Tian, Zequn Liu, Yiding Li, Fengyi Chen, Zequn Liu, Youyong Kong, Fan Guo, Yuyao Li, Jinjing Shen, Zhijing Xie, Yiyun Luo, Xin Zhang, Yingce Xia

PDF

TL;DR

STAGE is a comprehensive benchmark for evaluating models' ability to understand, reason over, and generate coherent narratives from full-length movie screenplays across multiple tasks.

Contribution

It introduces a unified benchmark with four interconnected tasks, curated datasets, and annotations for holistic narrative understanding in both English and Chinese.

Findings

01

Provides datasets for 150 films in English and Chinese.

02

Enables evaluation of models' world-building and reasoning capabilities.

03

Supports multiple tasks including graph construction and character role-playing.

Abstract

Movie screenplays are rich long-form narratives that interleave complex character relationships, temporally ordered events, and dialogue-driven interactions. While prior benchmarks target individual subtasks such as question answering or dialogue generation, they rarely evaluate whether models can construct a coherent story world and use it consistently across multiple forms of reasoning and generation. We introduce STAGE (Screenplay Text, Agents, Graphs and Evaluation), a unified benchmark for narrative understanding over full-length movie screenplays. STAGE defines four tasks: knowledge graph construction, scene-level event summarization, long-context screenplay question answering, and in-script character role-playing, all grounded in a shared narrative world representation. The benchmark provides cleaned scripts, curated knowledge graphs, and event- and character-centric annotations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.