SpectroStream: A Versatile Neural Codec for General Audio
Yunpeng Li, Kehang Han, Brian McWilliams, Zalan Borsos, Marco Tagliasacchi

TL;DR
SpectroStream is a neural audio codec that achieves high-quality, full-band stereo music compression at low bit rates by leveraging time-frequency domain representations and a novel multi-channel architecture.
Contribution
It introduces SpectroStream, a neural codec extending SoundStream's capabilities to 48 kHz stereo audio with improved quality and a new delayed-fusion strategy for multi-channel handling.
Findings
Supports 48 kHz stereo music at 4-16 kbps
Outperforms previous codecs in audio quality at low bit rates
Uses a novel time-frequency domain neural architecture
Abstract
We propose SpectroStream, a full-band multi-channel neural audio codec. Successor to the well-established SoundStream, SpectroStream extends its capability beyond 24 kHz monophonic audio and enables high-quality reconstruction of 48 kHz stereo music at bit rates of 4--16 kbps. This is accomplished with a new neural architecture that leverages audio representation in the time-frequency domain, which leads to better audio quality especially at higher sample rate. The model also uses a delayed-fusion strategy to handle multi-channel audio, which is crucial in balancing per-channel acoustic quality and cross-channel phase consistency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
