Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng, Kun Zhou, Yi-Wen Chao, Zhiwei Xiong, Bin Ma, Eng Siong Chng

TL;DR
MUFFIN is a novel neural psychoacoustic coding framework that uses multi-band spectral residual vector quantization and transformer-inspired architecture to achieve high-fidelity, efficient audio compression and effective downstream task performance.
Contribution
Introduces MUFFIN, a fully convolutional neural psychoacoustic coding framework with multi-band spectral residual vector quantization and a transformer-inspired backbone for improved audio compression.
Findings
Outperforms existing methods in reconstruction quality on benchmarks.
Achieves a state-of-the-art 12.5 Hz compression rate with minimal quality loss.
Effective as a token representation for downstream generative tasks.
Abstract
Achieving high-fidelity audio compression while preserving perceptual quality across diverse content remains a key challenge in Neural Audio Coding (NAC). We introduce MUFFIN, a fully convolutional Neural Psychoacoustic Coding (NPC) framework that leverages psychoacoustically guided multi-band frequency reconstruction. At its core is a Multi-Band Spectral Residual Vector Quantization (MBS-RVQ) module that allocates bitrate across frequency bands based on perceptual salience. This design enables efficient compression while disentangling speaker identity from content using distinct codebooks. MUFFIN incorporates a transformer-inspired convolutional backbone and a modified snake activation to enhance resolution in fine-grained spectral regions. Experimental results on multiple benchmarks demonstrate that MUFFIN consistently outperforms existing approaches in reconstruction quality. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing
