Audio Denoising for Robust Audio Fingerprinting
Kamil Akesbi

TL;DR
This paper introduces a hybrid deep learning approach to improve the robustness of audio fingerprinting systems against background noise by integrating a denoising model before peak extraction.
Contribution
It proposes a novel hybrid strategy combining deep learning denoising with spectral peak-based fingerprinting, including a new loss function tailored for this purpose.
Findings
Enhanced robustness of AFP systems in noisy environments
Improved spectral peak accuracy with the denoising model
First testing of a hybrid deep learning and peak-based AFP approach
Abstract
Music discovery services let users identify songs from short mobile recordings. These solutions are often based on Audio Fingerprinting, and rely more specifically on the extraction of spectral peaks in order to be robust to a number of distortions. Few works have been done to study the robustness of these algorithms to background noise captured in real environments. In particular, AFP systems still struggle when the signal to noise ratio is low, i.e when the background noise is strong. In this project, we tackle this problematic with Deep Learning. We test a new hybrid strategy which consists of inserting a denoising DL model in front of a peak-based AFP algorithm. We simulate noisy music recordings using a realistic data augmentation pipeline, and train a DL model to denoise them. The denoising model limits the impact of background noise on the AFP system's extracted peaks, improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsTest
