Video-Music Retrieval:A Dual-Path Cross-Modal Network

Xin Gu; Yinghua Shen; Chaohui Lv

arXiv:2211.08878·cs.MM·November 17, 2022

Video-Music Retrieval:A Dual-Path Cross-Modal Network

Xin Gu, Yinghua Shen, Chaohui Lv

PDF

Open Access

TL;DR

This paper introduces a dual-path cross-modal network for video-music retrieval that integrates content and emotional information, significantly improving retrieval accuracy over existing methods.

Contribution

The paper presents a novel dual-path network architecture that combines content and emotional features for more effective video-music retrieval.

Findings

01

Recall@1 increased by 3.94

02

Recall@25 increased by 16.36

03

Effective merging of content and emotional information

Abstract

We propose a method to recommend background music for videos. Current work rarely considers the emotional information of music, which is essential for video music retrieval. To achieve this, we design two paths to process content information and emotional information between modal. Based on characteristics of video and music, we design various feature extraction schemes and common representation spaces. More importantly, we propose a way to combine content information with emotional information. Additionally, we make improvements to the classical metric loss to be more suited to this task. Experiments show that this dual path video music retrieval network can effectively merge information. Compare with existing methods, the retrieval task evaluation index: increasing Recall@1 by 3.94 and Recall@25 by 16.36.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Video Analysis and Summarization · Diverse Musicological Studies