Learning Neural Vocoder from Range-Null Space Decomposition

Andong Li; Tong Lei; Zhihang Sun; Rilin Chen; Erwei Yin; Xiaodong Li; and Chengshi Zheng

arXiv:2507.20731·cs.SD·July 29, 2025

Learning Neural Vocoder from Range-Null Space Decomposition

Andong Li, Tong Lei, Zhihang Sun, Rilin Chen, Erwei Yin, Xiaodong Li, and Chengshi Zheng

PDF

TL;DR

This paper introduces a novel neural vocoder leveraging range-null space decomposition in the time-frequency domain, improving spectral detail generation and achieving state-of-the-art results with lightweight models.

Contribution

It proposes a new dual-path neural vocoder framework based on classical RND theory, combining linear domain shift and learnable spectral detail modeling.

Findings

01

Achieves state-of-the-art performance on LJSpeech and LibriTTS benchmarks.

02

Uses lightweight network parameters while maintaining high quality.

03

Demonstrates effective spectral detail reconstruction via range-null space decomposition.

Abstract

Despite the rapid development of neural vocoders in recent years, they usually suffer from some intrinsic challenges like opaque modeling, and parameter-performance trade-off. In this study, we propose an innovative time-frequency (T-F) domain-based neural vocoder to resolve the above-mentioned challenges. To be specific, we bridge the connection between the classical signal range-null decomposition (RND) theory and vocoder task, and the reconstruction of target spectrogram can be decomposed into the superimposition between the range-space and null-space, where the former is enabled by a linear domain shift from the original mel-scale domain to the target linear-scale domain, and the latter is instantiated via a learnable network for further spectral detail generation. Accordingly, we propose a novel dual-path framework, where the spectrum is hierarchically encoded/decoded, and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.