TL;DR
This paper introduces an adaptive DCTNet feature extraction method for audio signals that improves classification accuracy by better capturing low-frequency information, outperforming traditional features like MFSC.
Contribution
The paper proposes the adaptive DCTNet (A-DCTNet), which is scale-adaptive and enhances low-frequency feature extraction for audio classification tasks.
Findings
A-DCTNet achieves state-of-the-art bird song classification accuracy.
A-DCTNet improves artist identification in music data.
A-DCTNet outperforms traditional features like MFSC.
Abstract
In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
