Learning Source Disentanglement in Neural Audio Codec
Xiaoyu Bie, Xubo Liu, Ga\"el Richard

TL;DR
This paper introduces SD-Codec, a neural audio codec that jointly learns audio resynthesis and source separation, enabling better disentanglement of sound sources like speech and music while maintaining high-quality audio reconstruction.
Contribution
The paper proposes a novel source-disentangled neural audio codec that explicitly separates different sound domains into distinct codebooks, improving interpretability and controllability.
Findings
SD-Codec maintains competitive audio resynthesis quality.
Demonstrates successful disentanglement of sources in the latent space.
Enhances interpretability and control in audio generation.
Abstract
Neural audio codecs have significantly advanced audio compression by efficiently converting continuous audio signals into discrete tokens. These codecs preserve high-quality sound and enable sophisticated sound generation through generative models trained on these tokens. However, existing neural codec models are typically trained on large, undifferentiated audio datasets, neglecting the essential discrepancies between sound domains like speech, music, and environmental sound effects. This oversight complicates data modeling and poses additional challenges to the controllability of sound generation. To tackle these issues, we introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a novel approach that combines audio coding and source separation. By jointly learning audio resynthesis and separation, SD-Codec explicitly assigns audio signals from different domains to distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
