Leveraging User-Generated Metadata of Online Videos for Cover Song   Identification

Simon Hachmeier; Robert J\"aschke

arXiv:2412.11818·cs.MM·December 17, 2024

Leveraging User-Generated Metadata of Online Videos for Cover Song Identification

Simon Hachmeier, Robert J\"aschke

PDF

Open Access

TL;DR

This paper presents a multi-modal approach that combines user-generated metadata and audio content to improve cover song identification on YouTube, demonstrating that metadata can enhance the stability and accuracy of retrieval.

Contribution

It introduces a novel multi-modal method integrating metadata and audio content for cover song identification, which is a significant advancement over audio-only approaches.

Findings

01

Metadata improves identification stability

02

Multi-modal approach outperforms audio-only methods

03

Metadata integration enhances retrieval accuracy

Abstract

YouTube is a rich source of cover songs. Since the platform itself is organized in terms of videos rather than songs, the retrieval of covers is not trivial. The field of cover song identification addresses this problem and provides approaches that usually rely on audio content. However, including the user-generated video metadata available on YouTube promises improved identification results. In this paper, we propose a multi-modal approach for cover song identification on online video platforms. We combine the entity resolution models with audio-based approaches using a ranking model. Our findings implicate that leveraging user-generated metadata can stabilize cover song identification performance on YouTube.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Music History and Culture