The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs
Samir Sadok, Laurent Girin, Xavier Alameda-Pineda

TL;DR
This paper introduces shape-gain decomposition into neural audio codecs, improving bitrate-distortion performance and reducing complexity by separately encoding gain and shape, inspired by classical speech coding techniques.
Contribution
The paper proposes a novel shape-gain decomposition method for neural audio codecs, enhancing robustness and efficiency by separating gain and shape processing.
Findings
Significant bitrate-distortion improvements achieved
Massive reduction in computational complexity
Enhanced robustness to input signal level variations
Abstract
Neural audio codecs (NACs) typically encode the short-term energy (gain) and normalized structure (shape) of speech/audio signals jointly within the same latent space. As a result, they are poorly robust to a global variation of the input signal level in the sense that such variation has strong influence on the embedding vectors at the output of the encoder and their quantization. This methodology is inherently inefficient, leading to codebook redundancy and suboptimal bitrate-distortion performance. To address these limitations, we propose to introduce shape-gain decomposition, widely used in classical speech/audio coding, into the NAC framework. The principle of the proposed Equalizer methodology is to decompose the input signal -- before the NAC encoder -- into gain and normalized shape vector on a short-term basis. The shape vector is processed by the NAC, while the gain is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques
