DBT-Net: Dual-branch federative magnitude and phase estimation with   attention-in-attention transformer for monaural speech enhancement

Guochen Yu; Andong Li; Hui Wang; Yutian Wang; Yuxuan Ke; and Chengshi; Zheng

arXiv:2202.07931·cs.SD·August 2, 2022

DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement

Guochen Yu, Andong Li, Hui Wang, Yutian Wang, Yuxuan Ke, and Chengshi, Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces DBT-Net, a dual-branch transformer-based framework for monaural speech enhancement that separately estimates magnitude and phase, leveraging attention mechanisms for improved performance.

Contribution

The novel dual-branch architecture with attention-in-attention transformers effectively decouples magnitude and phase estimation, enhancing speech enhancement performance.

Findings

01

Outperforms previous systems on benchmark datasets.

02

Achieves state-of-the-art speech quality and intelligibility.

03

Effectively captures long-term dependencies with attention-in-attention transformers.

Abstract

The decoupling-style concept begins to ignite in the speech enhancement area, which decouples the original complex spectrum estimation task into multiple easier sub-tasks i.e., magnitude-only recovery and the residual complex spectrum estimation)}, resulting in better performance and easier interpretability. In this paper, we propose a dual-branch federative magnitude and phase estimation framework, dubbed DBT-Net, for monaural speech enhancement, aiming at recovering the coarse- and fine-grained regions of the overall spectrum in parallel. From the complementary perspective, the magnitude estimation branch is designed to filter out dominant noise components in the magnitude domain, while the complex spectrum purification branch is elaborately designed to inpaint the missing spectral details and implicitly estimate the phase information in the complex-valued spectral domain. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuguochencuc/dbt-net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques