Monaural speech enhancement on drone via Adapter based transfer learning

Xingyu Chen; Hanwen Bi; Wei-Ting Lai; Fei Ma

arXiv:2405.10022·eess.AS·October 21, 2024

Monaural speech enhancement on drone via Adapter based transfer learning

Xingyu Chen, Hanwen Bi, Wei-Ting Lai, Fei Ma

PDF

Open Access

TL;DR

This paper introduces a transfer learning approach using a frequency domain adapter to improve monaural speech enhancement on drones, effectively handling ego-noise with less computational cost.

Contribution

It proposes a novel frequency domain bottleneck adapter for transfer learning, enabling efficient speech enhancement on drones without extensive fine-tuning.

Findings

01

Effective speech quality enhancement demonstrated

02

More computationally efficient than traditional fine-tuning

03

Robust across different drone noise scenarios

Abstract

Monaural Speech enhancement on drones is challenging because the ego-noise from the rotating motors and propellers leads to extremely low signal-to-noise ratios at onboard microphones. Although recent masking-based deep neural network methods excel in monaural speech enhancement, they struggle in the challenging drone noise scenario. Furthermore, existing drone noise datasets are limited, causing models to overfit. Considering the harmonic nature of drone noise, this paper proposes a frequency domain bottleneck adapter to enable transfer learning. Specifically, the adapter's parameters are trained on drone noise while retaining the parameters of the pre-trained Frequency Recurrent Convolutional Recurrent Network (FRCRN) fixed. Evaluation results demonstrate the proposed method can effectively enhance speech quality. Moreover, it is a more efficient alternative to fine-tuning models for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Speech Recognition and Synthesis

MethodsAdapter