Time-Graph Frequency Representation with Singular Value Decomposition   for Neural Speech Enhancement

Tingting Wang; Tianrui Wang; Meng Ge; Qiquan Zhang; Zirui Ge; Zhen; Yang

arXiv:2412.16823·eess.AS·December 25, 2024·ICASSP

Time-Graph Frequency Representation with Singular Value Decomposition for Neural Speech Enhancement

Tingting Wang, Tianrui Wang, Meng Ge, Qiquan Zhang, Zirui Ge, Zhen, Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel real-valued time-graph representation using GFT-SVD for neural speech enhancement, improving alignment of amplitude and phase modeling and outperforming existing methods in quality and intelligibility.

Contribution

The paper proposes a GFT-SVD based real-valued time-graph representation that better aligns amplitude and phase modeling in neural speech enhancement, avoiding phase recovery issues.

Findings

01

GFT-SVD outperforms GFT-EVD and STFT in speech enhancement tasks.

02

Real-valued GFT-SVD improves objective intelligibility and perceptual quality.

03

The method surpasses traditional two-stream network models in speech enhancement performance.

Abstract

Time-frequency (T-F) domain methods for monaural speech enhancement have benefited from the success of deep learning. Recently, focus has been put on designing two-stream network models to predict amplitude mask and phase separately, or, coupling the amplitude and phase into Cartesian coordinates and constructing real and imaginary pairs. However, most methods suffer from the alignment modeling of amplitude and phase (real and imaginary pairs) in a two-stream network framework, which inevitably incurs performance restrictions. In this paper, we introduce a graph Fourier transform defined with the singular value decomposition (GFT-SVD), resulting in real-valued time-graph representation for neural speech enhancement. This real-valued representation-based GFT-SVD provides an ability to align the modeling of amplitude and phase, leading to avoiding recovering the target speech phase…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wangfighting0015/GFT_project
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development

MethodsFocus · ALIGN