Summarization of Films and Documentaries Based on Subtitles and Scripts

Marta Apar\'icio; Paulo Figueiredo; Francisco Raposo; David Martins de; Matos; Ricardo Ribeiro; Lu\'is Marujo

arXiv:1506.01273·cs.CL·March 10, 2016

Summarization of Films and Documentaries Based on Subtitles and Scripts

Marta Apar\'icio, Paulo Figueiredo, Francisco Raposo, David Martins de, Matos, Ricardo Ribeiro, Lu\'is Marujo

PDF

TL;DR

This paper evaluates the effectiveness of generic text summarization algorithms on films and documentaries using datasets of news articles, scripts, and subtitles, and compares their performance with traditional news summarization benchmarks.

Contribution

It demonstrates the applicability of existing summarization algorithms to multimedia content like films and documentaries, highlighting their relative performance across different media types.

Findings

01

LSA performs best for news articles and documentaries.

02

LexRank and Support Sets are most effective for film summaries.

03

Summarization behavior is consistent across news, films, and documentaries.

Abstract

We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well-known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.