VCNAC: A Variable-Channel Neural Audio Codec for Mono, Stereo, and Surround Sound
Florian Gr\"otschla, Arunasish Sen, Alessandro Lombardi, Guillermo C\'ambara, Andreas Schwarz

TL;DR
VCNAC introduces a unified neural audio codec capable of handling mono, stereo, and surround sound with a single model, ensuring high-quality reconstruction across various channel setups.
Contribution
It presents a novel variable-channel neural audio codec with a shared encoder-decoder architecture supporting multiple channel configurations.
Findings
Maintains high perceptual quality across mono, stereo, and surround sound.
Supports inference scalability across different audio modalities.
Achieves competitive objective spatial audio metrics and positive subjective listening results.
Abstract
We present VCNAC, a variable channel neural audio codec. Our approach features a single encoder and decoder parametrization that enables native inference for different channel setups, from mono speech to cinematic 5.1 channel surround audio. Channel compatibility objectives ensure that multi-channel content maintains perceptual quality when decoded to fewer channels. The shared representation enables training of generative language models on a single set of codebooks while supporting inference-time scalability across modalities and channel configurations. Evaluation using objective spatial audio metrics and subjective listening tests demonstrates that our unified approach maintains high reconstruction quality across mono, stereo, and surround audio configurations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
