High-Fidelity Music Vocoder using Neural Audio Codecs
Luca A. Lanzend\"orfer, Florian Gr\"otschla, Michael Ungersb\"ock,, Roger Wattenhofer

TL;DR
This paper introduces DisCoder, a neural vocoder that uses a neural audio codec and GAN architecture to produce high-fidelity polyphonic music and speech, achieving state-of-the-art results and demonstrating universality.
Contribution
The paper presents DisCoder, a novel neural vocoder leveraging a neural audio codec and GANs for high-fidelity music and speech synthesis, advancing the state-of-the-art.
Findings
Achieves state-of-the-art performance in music synthesis metrics.
Outperforms previous models in a MUSHRA listening test.
Demonstrates competitive speech synthesis results.
Abstract
While neural vocoders have made significant progress in high-fidelity speech synthesis, their application on polyphonic music has remained underexplored. In this work, we propose DisCoder, a neural vocoder that leverages a generative adversarial encoder-decoder architecture informed by a neural audio codec to reconstruct high-fidelity 44.1 kHz audio from mel spectrograms. Our approach first transforms the mel spectrogram into a lower-dimensional representation aligned with the Descript Audio Codec (DAC) latent space before reconstructing it to an audio signal using a fine-tuned DAC decoder. DisCoder achieves state-of-the-art performance in music synthesis on several objective metrics and in a MUSHRA listening study. Our approach also shows competitive performance in speech synthesis, highlighting its potential as a universal vocoder.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
MethodsDynamic Algorithm Configuration
