Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning
Shuaicheng Li, Feng Zhang, Kunlin Yang, Lingbo Liu, Shinan Liu, Jun, Hou, Shuai Yi

TL;DR
This paper introduces a novel video highlight detection method that combines intra- and cross-modality encoding with a hard-pairs guided contrastive learning scheme to improve representation quality and discriminative power.
Contribution
It proposes a comprehensive multi-modal encoding framework and a hard-pairs sampling strategy for contrastive learning, advancing the state-of-the-art in video highlight detection.
Findings
Outperforms existing methods on benchmark datasets
Enhances intra- and cross-modality relation modeling
Improves feature discrimination with hard-pairs contrastive learning
Abstract
Video highlight detection is a crucial yet challenging problem that aims to identify the interesting moments in untrimmed videos. The key to this task lies in effective video representations that jointly pursue two goals, \textit{i.e.}, cross-modal representation learning and fine-grained feature discrimination. In this paper, these two challenges are tackled by not only enriching intra-modality and cross-modality relations for representation modeling but also shaping the features in a discriminative manner. Our proposed method mainly leverages the intra-modality encoding and cross-modality co-occurrence encoding for fully representation modeling. Specifically, intra-modality encoding augments the modality-wise features and dampens irrelevant modality via within-modality relation learning in both audio and visual signals. Meanwhile, cross-modality co-occurrence encoding focuses on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Image Enhancement Techniques · Advanced Image and Video Retrieval Techniques
MethodsContrastive Learning
