Learning Neural Vocoder from Range-Null Space Decomposition
Andong Li, Tong Lei, Zhihang Sun, Rilin Chen, Erwei Yin, Xiaodong Li, and Chengshi Zheng

TL;DR
This paper introduces a novel neural vocoder leveraging range-null space decomposition in the time-frequency domain, improving spectral detail generation and achieving state-of-the-art results with lightweight models.
Contribution
It proposes a new dual-path neural vocoder framework based on classical RND theory, combining linear domain shift and learnable spectral detail modeling.
Findings
Achieves state-of-the-art performance on LJSpeech and LibriTTS benchmarks.
Uses lightweight network parameters while maintaining high quality.
Demonstrates effective spectral detail reconstruction via range-null space decomposition.
Abstract
Despite the rapid development of neural vocoders in recent years, they usually suffer from some intrinsic challenges like opaque modeling, and parameter-performance trade-off. In this study, we propose an innovative time-frequency (T-F) domain-based neural vocoder to resolve the above-mentioned challenges. To be specific, we bridge the connection between the classical signal range-null decomposition (RND) theory and vocoder task, and the reconstruction of target spectrogram can be decomposed into the superimposition between the range-space and null-space, where the former is enabled by a linear domain shift from the original mel-scale domain to the target linear-scale domain, and the latter is instantiated via a learnable network for further spectral detail generation. Accordingly, we propose a novel dual-path framework, where the spectrum is hierarchically encoded/decoded, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
