Comparative Study of Large Language Models on Chinese Film Script Continuation: An Empirical Analysis Based on GPT-5.2 and Qwen-Max

Yuxuan Cao; Zida Yang; Ye Wang

arXiv:2601.14826·cs.CL·January 22, 2026

Comparative Study of Large Language Models on Chinese Film Script Continuation: An Empirical Analysis Based on GPT-5.2 and Qwen-Max

Yuxuan Cao, Zida Yang, Ye Wang

PDF

Open Access

TL;DR

This study systematically compares GPT-5.2 and Qwen-Max on Chinese film script continuation, revealing GPT-5.2's superior performance in structural coherence and overall quality through a novel benchmark and multi-dimensional evaluation framework.

Contribution

It introduces the first Chinese film script continuation benchmark and a comprehensive evaluation framework for assessing LLMs in culturally specific creative writing tasks.

Findings

01

GPT-5.2 outperforms Qwen-Max in structural preservation and overall quality.

02

Qwen-Max has marginally higher ROUGE-L scores but lower structural and qualitative scores.

03

GPT-5.2 demonstrates better character consistency and tone matching.

Abstract

As large language models (LLMs) are increasingly applied to creative writing, their performance on culturally specific narrative tasks warrants systematic investigation. This study constructs the first Chinese film script continuation benchmark comprising 53 classic films, and designs a multi-dimensional evaluation framework comparing GPT-5.2 and Qwen-Max-Latest. Using a "first half to second half" continuation paradigm with 3 samples per film, we obtained 303 valid samples (GPT-5.2: 157, 98.7% validity; Qwen-Max: 146, 91.8% validity). Evaluation integrates ROUGE-L, Structural Similarity, and LLM-as-Judge scoring (DeepSeek-Reasoner). Statistical analysis of 144 paired samples reveals: Qwen-Max achieves marginally higher ROUGE-L (0.2230 vs 0.2114, d=-0.43); however, GPT-5.2 significantly outperforms in structural preservation (0.93 vs 0.75, d=0.46), overall quality (44.79 vs 25.72,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtistic and Creative Research · Artificial Intelligence in Games · Mental Health via Writing