Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification
R. Oguz Araz, Guillem Cort\`{e}s-Sebasti\`{a}, Emilio Molina, Joan Serr\`{a}, Xavier Serra, Yuki Mitsufuji, Dmitry Bogdanov

TL;DR
This paper improves neural audio fingerprinting robustness to real-world audio degradations by proposing best practices, systematically evaluating metric learning methods, and demonstrating state-of-the-art results with a self-supervised triplet loss approach.
Contribution
It introduces best practices for self-supervised training, systematically compares metric learning approaches, and achieves state-of-the-art performance in music identification under degraded conditions.
Findings
Self-supervised triplet loss outperforms other metric learning methods.
Training with multiple positives has different effects depending on the loss function.
Proposed approach achieves state-of-the-art results on degraded and real-world datasets.
Abstract
Audio fingerprinting (AFP) allows the identification of unknown audio content by extracting compact representations, termed audio fingerprints, that are designed to remain robust against common audio degradations. Neural AFP methods often employ metric learning, where representation quality is influenced by the nature of the supervision and the utilized loss function. However, recent work unrealistically simulates real-life audio degradation during training, resulting in sub-optimal supervision. Additionally, although several modern metric learning approaches have been proposed, current neural AFP methods continue to rely on the NT-Xent loss without exploring the recent advances or classical alternatives. In this work, we propose a series of best practices to enhance the self-supervision by leveraging musical signal properties and realistic room acoustics. We then present the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
