DramaBench: A Six-Dimensional Evaluation Framework for Drama Script Continuation

Shijian Ma; Yunqi Huang; Yan Lin

arXiv:2512.19012·cs.CL·December 24, 2025

DramaBench: A Six-Dimensional Evaluation Framework for Drama Script Continuation

Shijian Ma, Yunqi Huang, Yan Lin

PDF

Open Access 1 Models 4 Datasets

TL;DR

DramaBench introduces a comprehensive six-dimensional benchmark for evaluating drama script continuation, addressing key aspects like character consistency and emotional depth, with extensive evaluation of state-of-the-art models and human validation.

Contribution

It is the first large-scale benchmark to evaluate drama scripts across six independent dimensions, combining rule-based and LLM-based analysis for objective assessment.

Findings

01

8 state-of-the-art models evaluated on 1,103 scripts

02

65.9% of pairwise comparisons show significant differences

03

Human validation confirms the reliability of the evaluation framework

Abstract

Drama script continuation requires models to maintain character consistency, advance plot coherently, and preserve dramatic structurecapabilities that existing benchmarks fail to evaluate comprehensively. We present DramaBench, the first large-scale benchmark for evaluating drama script continuation across six independent dimensions: Format Standards, Narrative Efficiency, Character Consistency, Emotional Depth, Logic Consistency, and Conflict Handling. Our framework combines rulebased analysis with LLM-based labeling and statistical metrics, ensuring objective and reproducible evaluation. We conduct comprehensive evaluation of 8 state-of-the-art language models on 1,103 scripts (8,824 evaluations total), with rigorous statistical significance testing (252 pairwise comparisons, 65.9% significant) and human validation (188 scripts, substantial agreement on 3/5 dimensions). Our ablation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
FutureMa/Qwen3-8B-Drama-Thinking
model· 57 dl· ♡ 56
57 dl♡ 56

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Topic Modeling · Mental Health via Writing