Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm   for Speech Enhancement

Tao Zheng; Liejun Wang; Yinfeng Yu

arXiv:2408.06911·eess.AS·August 14, 2024

Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement

Tao Zheng, Liejun Wang, Yinfeng Yu

PDF

Open Access

TL;DR

This paper introduces HFSDA, a novel speech enhancement framework that combines heterogeneous spatial features and dual-dimension attention, leveraging self-supervised embeddings and ODConv technology to improve speech clarity in noisy environments.

Contribution

The study presents a new paradigm integrating heterogeneous spatial features and dual-dimension attention, along with ODConv and an enhanced Conformer, for superior speech enhancement performance.

Findings

01

HFSDA performs comparably to state-of-the-art models on VCTK-DEMAND.

02

The dual-dimension attention improves focus on critical speech features.

03

ODConv enhances multi-dimensional feature extraction.

Abstract

Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research. In addressing speech tasks, confining the attention mechanism solely to the temporal dimension poses limitations in effectively focusing on critical speech features. Considering the aforementioned issues, our study introduces a novel speech enhancement framework, HFSDA, which skillfully integrates heterogeneous spatial features and incorporates a dual-dimension attention mechanism to significantly enhance speech clarity and quality in noisy environments. By leveraging self-supervised learning embeddings in tandem with Short-Time Fourier Transform (STFT) spectrogram features, our model excels at capturing both high-level semantic information and detailed spectral data, enabling a more thorough analysis and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis

MethodsSoftmax · Attention Is All You Need · Convolution