An Exploration of Length Generalization in Transformer-Based Speech   Enhancement

Qiquan Zhang; Hongxu Zhu; Xinyuan Qian; Eliathamby Ambikairajah,; Haizhou Li

arXiv:2406.11401·eess.AS·June 18, 2024·Interspeech

An Exploration of Length Generalization in Transformer-Based Speech Enhancement

Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah,, Haizhou Li

PDF

Open Access

TL;DR

This paper investigates how Transformer-based speech enhancement models can generalize across different utterance lengths, emphasizing the role of position embeddings, especially relative position embeddings, in improving length generalization.

Contribution

It systematically explores the impact of various position embedding schemes on length generalization in Transformer speech enhancement models, highlighting the effectiveness of relative position embeddings.

Findings

01

Relative position embeddings outperform absolute position embeddings in length generalization.

02

Position embeddings significantly alleviate the impact of utterance length on model performance.

03

The study provides practical insights for designing more robust Transformer-based speech enhancement systems.

Abstract

The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In this paper, we conduct comprehensive experiments to explore the length generalization problem in speech enhancement with Transformer. Our findings first establish that position embedding provides an effective instrument to alleviate the impact of utterance length on Transformer-based speech enhancement. Specifically, we explore four different position embedding schemes to enable length generalization. The results confirm the superiority of relative position embeddings (RPEs) over absolute PE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques

MethodsLinear Layer · Multi-Head Attention · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam