AudioRepInceptionNeXt: A lightweight single-stream architecture for   efficient audio recognition

Kin Wai Lau; Yasar Abbas Ur Rehman; Lai-Man Po

arXiv:2404.13551·cs.SD·April 23, 2024

AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition

Kin Wai Lau, Yasar Abbas Ur Rehman, Lai-Man Po

PDF

Open Access 1 Repo

TL;DR

AudioRepInceptionNeXt is a lightweight, efficient neural network architecture for audio recognition that reduces computational costs by over 50% and speeds up inference by 1.28 times while maintaining accuracy, suitable for edge devices.

Contribution

The paper introduces AudioRepInceptionNeXt, a novel single-stream architecture with cascaded multi-scale depth-wise convolutions, inspired by efficient vision models, optimized for audio recognition tasks.

Findings

01

Reduces parameters and computations by over 50%.

02

Improves inference speed by 1.28 times over state-of-the-art CNNs.

03

Maintains comparable accuracy across various audio tasks.

Abstract

Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt and ConvNeXt, we propose AudioRepInceptionNeXt, a single-stream architecture. Its basic building block breaks down the parallel multi-branch depth-wise convolutions with descending scales of k x k kernels into a cascade of two multi-branch depth-wise convolutions. The first multi-branch consists of parallel multi-scale 1 x k depth-wise convolutional layers followed by a similar multi-branch employing parallel multi-scale k x 1 depth-wise convolutional layers. This reduces computational and memory footprint while separating time and frequency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stevenlauhkhk/audiorepinceptionnext
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis