Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement
Tao Zheng, Liejun Wang, Yinfeng Yu

TL;DR
This paper introduces HFSDA, a novel speech enhancement framework that combines heterogeneous spatial features and dual-dimension attention, leveraging self-supervised embeddings and ODConv technology to improve speech clarity in noisy environments.
Contribution
The study presents a new paradigm integrating heterogeneous spatial features and dual-dimension attention, along with ODConv and an enhanced Conformer, for superior speech enhancement performance.
Findings
HFSDA performs comparably to state-of-the-art models on VCTK-DEMAND.
The dual-dimension attention improves focus on critical speech features.
ODConv enhances multi-dimensional feature extraction.
Abstract
Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research. In addressing speech tasks, confining the attention mechanism solely to the temporal dimension poses limitations in effectively focusing on critical speech features. Considering the aforementioned issues, our study introduces a novel speech enhancement framework, HFSDA, which skillfully integrates heterogeneous spatial features and incorporates a dual-dimension attention mechanism to significantly enhance speech clarity and quality in noisy environments. By leveraging self-supervised learning embeddings in tandem with Short-Time Fourier Transform (STFT) spectrogram features, our model excels at capturing both high-level semantic information and detailed spectral data, enabling a more thorough analysis and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
MethodsSoftmax · Attention Is All You Need · Convolution
