Time-Graph Frequency Representation with Singular Value Decomposition for Neural Speech Enhancement
Tingting Wang, Tianrui Wang, Meng Ge, Qiquan Zhang, Zirui Ge, Zhen, Yang

TL;DR
This paper introduces a novel real-valued time-graph representation using GFT-SVD for neural speech enhancement, improving alignment of amplitude and phase modeling and outperforming existing methods in quality and intelligibility.
Contribution
The paper proposes a GFT-SVD based real-valued time-graph representation that better aligns amplitude and phase modeling in neural speech enhancement, avoiding phase recovery issues.
Findings
GFT-SVD outperforms GFT-EVD and STFT in speech enhancement tasks.
Real-valued GFT-SVD improves objective intelligibility and perceptual quality.
The method surpasses traditional two-stream network models in speech enhancement performance.
Abstract
Time-frequency (T-F) domain methods for monaural speech enhancement have benefited from the success of deep learning. Recently, focus has been put on designing two-stream network models to predict amplitude mask and phase separately, or, coupling the amplitude and phase into Cartesian coordinates and constructing real and imaginary pairs. However, most methods suffer from the alignment modeling of amplitude and phase (real and imaginary pairs) in a two-stream network framework, which inevitably incurs performance restrictions. In this paper, we introduce a graph Fourier transform defined with the singular value decomposition (GFT-SVD), resulting in real-valued time-graph representation for neural speech enhancement. This real-valued representation-based GFT-SVD provides an ability to align the modeling of amplitude and phase, leading to avoiding recovering the target speech phase…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
MethodsFocus · ALIGN
