Rapid solution for searching similar audio items

Kastriot Kadriu

arXiv:2201.11178·cs.SD·January 28, 2022

Rapid solution for searching similar audio items

Kastriot Kadriu

PDF

Open Access

TL;DR

This paper introduces a rapid audio similarity search method using Locality Sensitive Hashing to handle high-dimensional feature vectors efficiently, leveraging sound production principles for feature selection.

Contribution

It proposes a novel approach combining sound production principles with hashing techniques to improve audio item retrieval speed and accuracy.

Findings

01

Significantly reduces search time for large audio datasets

02

Effectively handles high-dimensional feature vectors

03

Improves accuracy of audio similarity detection

Abstract

A naive approach for finding similar audio items would be to compare each entry from the feature vector of the test example with each feature vector of the candidates in a k-nearest neighbors fashion. There are already two problems with this approach: audio signals are represented by high dimensional vectors and the number of candidates can be very large - think thousands. The search process would have a high complexity. Our paper will treat this problem through hashing methodologies more specifically the Locality Sensitive Hashing. This project will be in the spirit of classification and clustering problems. The computer sound production principles will be used to determine which features that describe an audio signal are the most useful. That will down-sample the size of the feature vectors and speed up the process subsequently.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Face and Expression Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings