MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding   Space Binding

Jiajie Teng; Huiyu Duan; Yucheng Zhu; Sijing Wu; Guangtao Zhai

arXiv:2405.09286·cs.MM·May 16, 2024

MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space Binding

Jiajie Teng, Huiyu Duan, Yucheng Zhu, Sijing Wu, Guangtao Zhai

PDF

Open Access

TL;DR

This paper presents MVBind, a self-supervised model that learns cross-modal embeddings for automatic music recommendation in short videos, leveraging a newly created dataset for improved performance.

Contribution

Introduces MVBind, a novel self-supervised embedding space binding model for cross-modal music-video retrieval in short videos, along with a new dataset SVM-10K.

Findings

01

MVBind outperforms baseline methods on the SVM-10K dataset.

02

Constructed dataset SVM-10K enables effective training of music-video retrieval models.

03

Self-supervised learning reduces the need for manual annotations.

Abstract

Recent years have witnessed the rapid development of short videos, which usually contain both visual and audio modalities. Background music is important to the short videos, which can significantly influence the emotions of the viewers. However, at present, the background music of short videos is generally chosen by the video producer, and there is a lack of automatic music recommendation methods for short videos. This paper introduces MVBind, an innovative Music-Video embedding space Binding model for cross-modal retrieval. MVBind operates as a self-supervised approach, acquiring inherent knowledge of intermodal relationships directly from data, without the need of manual annotations. Additionally, to compensate the lack of a corresponding musical-visual pair dataset for short videos, we construct a dataset, SVM-10K(Short Video with Music-10K), which mainly consists of meticulously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Video Analysis and Summarization