SSVMR: Saliency-based Self-training for Video-Music Retrieval

Xuxin Cheng; Zhihong Zhu; Hongxiang Li; Yaowei Li; Yuexian Zou

arXiv:2302.09328·cs.MM·February 21, 2023·1 cites

SSVMR: Saliency-based Self-training for Video-Music Retrieval

Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yaowei Li, Yuexian Zou

PDF

Open Access

TL;DR

This paper introduces SSVMR, a saliency-based self-training framework for video-music retrieval that effectively handles label noise and enhances critical video clip capture, achieving state-of-the-art results.

Contribution

The paper proposes a novel semi-supervised, saliency-based self-training method for VMR that improves robustness to label noise and emphasizes critical video segments.

Findings

01

Achieves 34.8% relative improvement in R@1 over previous models

02

Effectively suppresses label noise through semi-supervised self-training

03

Enhances critical video clip capture via saliency-based mixing

Abstract

With the rise of short videos, the demand for selecting appropriate background music (BGM) for a video has increased significantly, video-music retrieval (VMR) task gradually draws much attention by research community. As other cross-modal learning tasks, existing VMR approaches usually attempt to measure the similarity between the video and music in the feature space. However, they (1) neglect the inevitable label noise; (2) neglect to enhance the ability to capture critical video clips. In this paper, we propose a novel saliency-based self-training framework, which is termed SSVMR. Specifically, we first explore to fully make use of the information containing in the training dataset by applying a semi-supervised method to suppress the adverse impact of label noise problem, where a self-training approach is adopted. In addition, we propose to capture the saliency of the video by mixing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Cancer-related molecular mechanisms research