TL;DR
This paper introduces a novel movie summarization method that constructs sparse scene graphs based on key turning point scenes, resulting in more informative and genre-specific summaries.
Contribution
It proposes a new graph-based model that identifies key scenes using multimodal data, outperforming sequence models and general summarization algorithms.
Findings
Summaries are rated more informative and complete by human judges.
The method produces interpretable graphs with genre-specific topologies.
The approach outperforms sequence-based and general summarization methods.
Abstract
We summarize full-length movies by creating shorter videos containing their most informative scenes. We explore the hypothesis that a summary can be created by assembling scenes which are turning points (TPs), i.e., key events in a movie that describe its storyline. We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information. According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms. The induced graphs are interpretable, displaying different topology for different movie genres.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
