Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval
Zeyu Chen, Pengfei Zhang, Kai Ye, Wei Dong, Xin Feng, Yana Zhang

TL;DR
This paper introduces a novel inter-intra modal (II) loss for cross-modal video-music retrieval, addressing false negatives in self-supervised learning and achieving state-of-the-art results on the YouTube8M dataset.
Contribution
The paper proposes the II loss to reduce false negative noise in video-music retrieval, improving model robustness and performance in both self-supervised and supervised settings.
Findings
II loss effectively reduces false negative noise.
II-CLVM achieves state-of-the-art performance on YouTube8M.
II loss enhances retrieval accuracy with fewer training samples.
Abstract
The burgeoning short video industry has accelerated the advancement of video-music retrieval technology, assisting content creators in selecting appropriate music for their videos. In self-supervised training for video-to-music retrieval, the video and music samples in the dataset are separated from the same video work, so they are all one-to-one matches. This does not match the real situation. In reality, a video can use different music as background music, and a music can be used as background music for different videos. Many videos and music that are not in a pair may be compatible, leading to false negative noise in the dataset. A novel inter-intra modal (II) loss is proposed as a solution. By reducing the variation of feature distribution within the two modalities before and after the encoder, II loss can reduce the model's overfitting to such noise without removing it in a costly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
