A dictionary learning and source recovery based approach to classify diverse audio sources
K V Vijay Girish, T V Ananthapadmanabha, A G Ramakrishnan

TL;DR
This paper introduces a dictionary learning-based method for classifying diverse audio sources with high accuracy, utilizing cosine similarity and objective measures to achieve near-perfect frame-wise classification results.
Contribution
It presents a novel audio source classification approach combining dictionary learning, cosine similarity, and objective measures for improved accuracy.
Findings
Achieved 98.2% frame-wise classification accuracy across 12 sources.
100% accuracy with moving SDR over multiple frames for most sources.
Effective classification with minimal frame accumulation for challenging sources.
Abstract
A dictionary learning based audio source classification algorithm is proposed to classify a sample audio signal as one amongst a finite set of different audio sources. Cosine similarity measure is used to select the atoms during dictionary learning. Based on three objective measures proposed, namely, signal to distortion ratio (SDR), the number of non-zero weights and the sum of weights, a frame-wise source classification accuracy of 98.2% is obtained for twelve different sources. Cent percent accuracy has been obtained using moving SDR accumulated over six successive frames for ten of the audio sources tested, while the two other sources require accumulation of 10 and 14 frames.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques
