TL;DR
This paper introduces anti-aliasing techniques into neural audio synthesis models, significantly improving high-fidelity music and singing voice generation by reducing artifacts.
Contribution
It proposes differentiable anti-aliasing modules integrated into neural vocoders and codecs, creating Pupu-Vocoder and Pupu-Codec for better audio quality.
Findings
Outperform existing systems on singing voice, music, and audio.
Achieve comparable performance on speech.
Demonstrate effectiveness of anti-aliasing modules.
Abstract
In neural audio synthesis, neural vocoders and codecs are models that reconstruct waveforms from acoustic and latent representations, which are essential to the resulting audio quality. While current models are capable of generating perceptually natural speech, they still struggle with high-fidelity music and singing voice synthesis, as severe aliasing artifacts are introduced by non-linear activation functions and upsampling layers in existing architectures. Although various anti-aliasing techniques have been proposed in digital signal processing, their integration into neural vocoders and codecs remains under-explored. This paper incorporates differentiable anti-aliasing techniques into the activation and upsampling modules to bridge this gap, and thus presents Pupu-Vocoder and Pupu-Codec. We build a test signal benchmark to evaluate the anti-aliased modules, and validate our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
