SpectroStream: A Versatile Neural Codec for General Audio

Yunpeng Li; Kehang Han; Brian McWilliams; Zalan Borsos; Marco Tagliasacchi

arXiv:2508.05207·cs.SD·August 8, 2025

SpectroStream: A Versatile Neural Codec for General Audio

Yunpeng Li, Kehang Han, Brian McWilliams, Zalan Borsos, Marco Tagliasacchi

PDF

TL;DR

SpectroStream is a neural audio codec that achieves high-quality, full-band stereo music compression at low bit rates by leveraging time-frequency domain representations and a novel multi-channel architecture.

Contribution

It introduces SpectroStream, a neural codec extending SoundStream's capabilities to 48 kHz stereo audio with improved quality and a new delayed-fusion strategy for multi-channel handling.

Findings

01

Supports 48 kHz stereo music at 4-16 kbps

02

Outperforms previous codecs in audio quality at low bit rates

03

Uses a novel time-frequency domain neural architecture

Abstract

We propose SpectroStream, a full-band multi-channel neural audio codec. Successor to the well-established SoundStream, SpectroStream extends its capability beyond 24 kHz monophonic audio and enables high-quality reconstruction of 48 kHz stereo music at bit rates of 4--16 kbps. This is accomplished with a new neural architecture that leverages audio representation in the time-frequency domain, which leads to better audio quality especially at higher sample rate. The model also uses a delayed-fusion strategy to handle multi-channel audio, which is crucial in balancing per-channel acoustic quality and cross-channel phase consistency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.