Simultaneously Learning Robust Audio Embeddings and balanced Hash codes   for Query-by-Example

Anup Singh; Kris Demuynck; Vipul Arora

arXiv:2211.11060·eess.AS·January 20, 2023

Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Anup Singh, Kris Demuynck, Vipul Arora

PDF

Open Access

TL;DR

This paper introduces a self-supervised learning framework that simultaneously generates robust audio embeddings and balanced hash codes, improving retrieval speed and accuracy in large-scale audio fingerprinting systems.

Contribution

It proposes a novel end-to-end approach modeling hash codes as a balanced clustering problem using optimal transport, enhancing performance over existing methods.

Findings

01

Improved retrieval efficiency at high distortion levels.

02

High accuracy maintained with balanced hash codes.

03

System is scalable in computation and memory.

Abstract

Audio fingerprinting systems must efficiently and robustly identify query snippets in an extensive database. To this end, state-of-the-art systems use deep learning to generate compact audio fingerprints. These systems deploy indexing methods, which quantize fingerprints to hash codes in an unsupervised manner to expedite the search. However, these methods generate imbalanced hash codes, leading to their suboptimal performance. Therefore, we propose a self-supervised learning framework to compute fingerprints and balanced hash codes in an end-to-end manner to achieve both fast and accurate retrieval performance. We model hash codes as a balanced clustering process, which we regard as an instance of the optimal transport problem. Experimental results indicate that the proposed approach improves retrieval efficiency while preserving high accuracy, particularly at high distortion levels,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Advanced Image and Video Retrieval Techniques