Early MFCC And HPCP Fusion for Robust Cover Song Identification
Christopher J. Tralie

TL;DR
This paper introduces an unsupervised fusion method combining MFCC and HPCP features for robust cover song identification, significantly improving accuracy and establishing a new benchmark dataset.
Contribution
The authors propose a novel unsupervised fusion algorithm that combines MFCC and HPCP features, leveraging structural information for enhanced cover song identification.
Findings
Achieved a state-of-the-art mean reciprocal rank of 0.87 on the Covers80 dataset.
Introduced the Covers 1000 benchmark dataset with 1000 songs and 395 cover groups.
Attained an MRR of 0.9 on the new dataset for the first correctly identified song.
Abstract
While most schemes for automatic cover song identification have focused on note-based features such as HPCP and chord profiles, a few recent papers surprisingly showed that local self-similarities of MFCC-based features also have classification power for this task. Since MFCC and HPCP capture complementary information, we design an unsupervised algorithm that combines normalized, beat-synchronous blocks of these features using cross-similarity fusion before attempting to locally align a pair of songs. As an added bonus, our scheme naturally incorporates structural information in each song to fill in alignment gaps where both feature sets fail. We show a striking jump in performance over MFCC and HPCP alone, achieving a state of the art mean reciprocal rank of 0.87 on the Covers80 dataset. We also introduce a new medium-sized hand designed benchmark dataset called "Covers 1000," which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
