Learning Trailer Moments in Full-Length Movies
Lezi Wang, Dong Liu, Rohit Puri, and Dimitris N. Metaxas

TL;DR
This paper introduces a weakly supervised method for detecting key moments in full-length movies by leveraging trailers as guidance, using a novel ranking network with co-attention and contrastive attention modules, outperforming supervised methods.
Contribution
It presents the first movie-trailer dataset and a novel ranking network with co-attention and contrastive attention, enabling key moment detection without manual annotations.
Findings
The proposed method outperforms supervised approaches on key moment detection.
The Contrastive Attention module improves feature representation and detection accuracy.
The new dataset facilitates research in weakly supervised movie understanding.
Abstract
A movie's key moments stand out of the screenplay to grab an audience's attention and make movie browsing efficient. But a lack of annotations makes the existing approaches not applicable to movie key moment detection. To get rid of human annotations, we leverage the officially-released trailers as the weak supervision to learn a model that can detect the key moments from full-length movies. We introduce a novel ranking network that utilizes the Co-Attention between movies and trailers as guidance to generate the training pairs, where the moments highly corrected with trailers are expected to be scored higher than the uncorrelated moments. Additionally, we propose a Contrastive Attention module to enhance the feature representations such that the comparative contrast between features of the key and non-key moments are maximized. We construct the first movie-trailer dataset, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
