Source-Aware Neural Speech Coding for Noisy Speech Compression
Haici Yang, Kai Zhen, Seungkwon Beack, Minje Kim

TL;DR
This paper presents SANAC, a neural speech coding system that explicitly separates and encodes sources in noisy speech, improving quality and source prioritization over traditional codecs.
Contribution
It introduces a source-aware neural audio coding system that combines source separation and adaptive bit allocation in a unified neural framework.
Findings
SANAC outperforms baseline neural codecs in noisy speech reconstruction.
It effectively allocates bits to important sources, enhancing perceived quality.
Subjective tests confirm improved recovery of original noisy speech.
Abstract
This paper introduces a novel neural network-based speech coding system that can process noisy speech effectively. The proposed source-aware neural audio coding (SANAC) system harmonizes a deep autoencoder-based source separation model and a neural coding system so that it can explicitly perform source separation and coding in the latent space. An added benefit of this system is that the codec can allocate a different amount of bits to the underlying sources so that the more important source sounds better in the decoded signal. We target a new use case where the user on the receiver side cares about the quality of the non-speech components in speech communication, while the speech source still carries the most crucial information. Both objective and subjective evaluation tests show that SANAC can recover the original noisy speech better than the baseline neural audio coding system,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
