AEROMamba: An efficient architecture for audio super-resolution using   generative adversarial networks and state space models

Wallace Abreu; Luiz Wagner Pereira Biscainho

arXiv:2411.07364·eess.AS·November 13, 2024

AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models

Wallace Abreu, Luiz Wagner Pereira Biscainho

PDF

Open Access 1 Repo

TL;DR

AEROMamba introduces a novel architecture for audio super-resolution that replaces attention and LSTM modules with a state space model, significantly reducing memory usage and increasing speed while improving subjective audio quality.

Contribution

The paper presents AEROMamba, a new model that replaces traditional modules with a state space model, achieving faster inference, lower memory consumption, and better audio quality in super-resolution tasks.

Findings

01

14x faster inference compared to AERO

02

5x less GPU memory during training

03

Superior subjective listening scores on MUSDB and PianoEval datasets

Abstract

Audio super-resolution aims to enhance low-resolution signals by creating high-frequency content. In this work, we modify the architecture of AERO (a state-of-the-art system for this task) for music super-resolution. SPecifically, we replace its original Attention and LSTM layers with Mamba, a State Space Model (SSM), across all network layers. Mamba is capable of effectively substituting the mentioned modules, as it offers a mechanism similar to that of Attention while also functioning as a recurrent network. With the proposed AEROMamba, training requires 2-4x less GPU memory, since Mamba exploits the convolutional formulation and leverages GPU memory hierarchy. Additionally, during inference, Mamba operates in constant memory due to recurrence, avoiding memory growth associated with Attention. This results in a 14x speed improvement using 5x less GPU. Subjective listening tests (0 to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aeromamba-super-resolution/aeromamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Aerodynamics and Acoustics in Jet Flows

MethodsSoftmax · Attention Is All You Need · Sigmoid Activation · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Tanh Activation · Long Short-Term Memory · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings