TridentSE: Guiding Speech Enhancement with 32 Global Tokens

Dacheng Yin; Zhiyuan Zhao; Chuanxin Tang; Zhiwei Xiong; Chong Luo

arXiv:2210.12995·eess.AS·October 25, 2022

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo

PDF

Open Access

TL;DR

TridentSE introduces a novel speech enhancement architecture that efficiently captures global and local information using global tokens and cross attention, achieving high perceptual quality with lower computational cost.

Contribution

The paper proposes TridentSE, which combines local T-F bin representations with global tokens processed via cross attention for improved speech enhancement.

Findings

01

Achieves PESQ of 3.47 on VoiceBank+DEMAND

02

Achieves PESQ of 3.44 on DNS no-reverb

03

Outperforms previous methods with lower computational cost

Abstract

In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details. TridentSE maintains T-F bin level representation to capture details, and uses a small number of global tokens to process the global information. Information is propagated between the local and the global representations through cross attention modules. To capture both inter- and intra-frame information, the global tokens are divided into two groups to process along the time and the frequency axis respectively. A metric discriminator is further employed to guide our model to achieve higher perceptual quality. Even with significantly lower computational cost, TridentSE outperforms a variety of previous speech enhancement methods, achieving a PESQ of 3.47 on VoiceBank+DEMAND dataset and a PESQ of 3.44 on DNS no-reverb test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Infant Health and Development · Speech Recognition and Synthesis

MethodsTest