SummScreen: A Dataset for Abstractive Screenplay Summarization
Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel

TL;DR
SummScreen is a new dataset for abstractive summarization of TV series transcripts, highlighting challenges in capturing plot details and character information, and evaluating various models including neural and nearest neighbor approaches.
Contribution
Introduces SummScreen, a challenging dataset for TV series summarization, along with entity-centric evaluation metrics and analysis of model performances.
Findings
Oracle extractive approach outperforms neural models on automatic metrics.
Neural models struggle to fully exploit input transcripts.
Models can generate unfaithful facts, indicating room for improvement.
Abstract
We introduce SummScreen, a summarization dataset comprised of pairs of TV series transcripts and human written recaps. The dataset provides a challenging testbed for abstractive summarization for several reasons. Plot details are often expressed indirectly in character dialogues and may be scattered across the entirety of the transcript. These details must be found and integrated to form the succinct plot descriptions in the recaps. Also, TV scripts contain content that does not directly pertain to the central plot but rather serves to develop characters or provide comic relief. This information is rarely contained in recaps. Since characters are fundamental to TV series, we also propose two entity-centric evaluation metrics. Empirically, we characterize the dataset by evaluating several methods, including neural models and those based on nearest neighbors. An oracle extractive approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
