Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding

Dianwen Ng; Kun Zhou; Yi-Wen Chao; Zhiwei Xiong; Bin Ma; Eng Siong Chng

arXiv:2505.07235·cs.SD·May 13, 2025

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding

Dianwen Ng, Kun Zhou, Yi-Wen Chao, Zhiwei Xiong, Bin Ma, Eng Siong Chng

PDF

Open Access 1 Repo

TL;DR

MUFFIN is a novel neural psychoacoustic coding framework that uses multi-band spectral residual vector quantization and transformer-inspired architecture to achieve high-fidelity, efficient audio compression and effective downstream task performance.

Contribution

Introduces MUFFIN, a fully convolutional neural psychoacoustic coding framework with multi-band spectral residual vector quantization and a transformer-inspired backbone for improved audio compression.

Findings

01

Outperforms existing methods in reconstruction quality on benchmarks.

02

Achieves a state-of-the-art 12.5 Hz compression rate with minimal quality loss.

03

Effective as a token representation for downstream generative tasks.

Abstract

Achieving high-fidelity audio compression while preserving perceptual quality across diverse content remains a key challenge in Neural Audio Coding (NAC). We introduce MUFFIN, a fully convolutional Neural Psychoacoustic Coding (NPC) framework that leverages psychoacoustically guided multi-band frequency reconstruction. At its core is a Multi-Band Spectral Residual Vector Quantization (MBS-RVQ) module that allocates bitrate across frequency bands based on perceptual salience. This design enables efficient compression while disentangling speaker identity from content using distinct codebooks. MUFFIN incorporates a transformer-inspired convolutional backbone and a modified snake activation to enhance resolution in fine-grained spectral regions. Experimental results on multiple benchmarks demonstrate that MUFFIN consistently outperforms existing approaches in reconstruction quality. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dianwen-ng/muffin
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing