Enhancing Video Memorability Prediction with Text-Motion Cross-modal Contrastive Loss and Its Application in Video Summarization
Zhiyi Zhu, Xiaoyu Wu, Youwei Lu

TL;DR
This paper introduces a novel cross-modal contrastive loss to improve motion feature representation in video memorability prediction, achieving state-of-the-art results and applying it to enhance video summarization.
Contribution
The paper proposes the Text-Motion Cross-modal Contrastive Loss (TMCCL) to better utilize motion cues and introduces MWCVS to improve video summarization using memorability prediction.
Findings
Achieved state-of-the-art performance on two datasets.
Demonstrated effectiveness of memorability in video summarization.
Enhanced motion feature representation through TMCCL.
Abstract
Video memorability refers to the ability of videos to be recalled after viewing, playing a crucial role in creating content that remains memorable. Existing models typically focus on extracting multimodal features to predict video memorability scores but often fail to fully utilize motion cues. The representation of motion features is compromised during the fine-tuning phase of the motion feature extractor due to a lack of labeled data. In this paper, we introduce the Text-Motion Cross-modal Contrastive Loss (TMCCL), a multimodal video memorability prediction model designed to enhance the representation of motion features. We tackle the challenge of improving motion feature representation by leveraging text description similarities across videos to establish positive and negative motion sample sets for a given target. This enhancement allows the model to learn similar feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
MethodsFocus
