Learning Source Disentanglement in Neural Audio Codec

Xiaoyu Bie; Xubo Liu; Ga\"el Richard

arXiv:2409.11228·cs.SD·February 12, 2025

Learning Source Disentanglement in Neural Audio Codec

Xiaoyu Bie, Xubo Liu, Ga\"el Richard

PDF

Open Access

TL;DR

This paper introduces SD-Codec, a neural audio codec that jointly learns audio resynthesis and source separation, enabling better disentanglement of sound sources like speech and music while maintaining high-quality audio reconstruction.

Contribution

The paper proposes a novel source-disentangled neural audio codec that explicitly separates different sound domains into distinct codebooks, improving interpretability and controllability.

Findings

01

SD-Codec maintains competitive audio resynthesis quality.

02

Demonstrates successful disentanglement of sources in the latent space.

03

Enhances interpretability and control in audio generation.

Abstract

Neural audio codecs have significantly advanced audio compression by efficiently converting continuous audio signals into discrete tokens. These codecs preserve high-quality sound and enable sophisticated sound generation through generative models trained on these tokens. However, existing neural codec models are typically trained on large, undifferentiated audio datasets, neglecting the essential discrepancies between sound domains like speech, music, and environmental sound effects. This oversight complicates data modeling and poses additional challenges to the controllability of sound generation. To tackle these issues, we introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a novel approach that combines audio coding and source separation. By jointly learning audio resynthesis and separation, SD-Codec explicitly assigns audio signals from different domains to distinct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing