Raw Audio Classification with Cosine Convolutional Neural Network   (CosCovNN)

Kazi Nazmul Haque; Rajib Rana; Tasnim Jarin; Bjorn W. Schuller Jr

arXiv:2412.00312·cs.SD·December 3, 2024

Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)

Kazi Nazmul Haque, Rajib Rana, Tasnim Jarin, Bjorn W. Schuller Jr

PDF

Open Access

TL;DR

This paper introduces Cosine Convolutional Neural Networks (CosCovNN) for raw audio classification, demonstrating improved accuracy and efficiency over traditional CNNs, and presents an augmented model VQCCM that achieves state-of-the-art results across multiple datasets.

Contribution

The paper proposes replacing traditional CNN filters with cosine filters in raw audio classification, significantly reducing parameters and enhancing performance, and introduces VQCCM for state-of-the-art results.

Findings

01

CosCovNN surpasses traditional CNN accuracy with 77% fewer parameters.

02

VQCCM achieves state-of-the-art performance on five datasets.

03

Cosine filters improve CNN efficiency and accuracy in raw audio tasks.

Abstract

This study explores the field of audio classification from raw waveform using Convolutional Neural Networks (CNNs), a method that eliminates the need for extracting specialised features in the pre-processing step. Unlike recent trends in literature, which often focuses on designing frontends or filters for only the initial layers of CNNs, our research introduces the Cosine Convolutional Neural Network (CosCovNN) replacing the traditional CNN filters with Cosine filters. The CosCovNN surpasses the accuracy of the equivalent CNN architectures with approximately $77%$ less parameters. Our research further progresses with the development of an augmented CosCovNN named Vector Quantised Cosine Convolutional Neural Network with Memory (VQCCM), incorporating a memory and vector quantisation layer VQCCM achieves state-of-the-art (SOTA) performance across five different datasets in comparison…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing