Monaural speech enhancement on drone via Adapter based transfer learning
Xingyu Chen, Hanwen Bi, Wei-Ting Lai, Fei Ma

TL;DR
This paper introduces a transfer learning approach using a frequency domain adapter to improve monaural speech enhancement on drones, effectively handling ego-noise with less computational cost.
Contribution
It proposes a novel frequency domain bottleneck adapter for transfer learning, enabling efficient speech enhancement on drones without extensive fine-tuning.
Findings
Effective speech quality enhancement demonstrated
More computationally efficient than traditional fine-tuning
Robust across different drone noise scenarios
Abstract
Monaural Speech enhancement on drones is challenging because the ego-noise from the rotating motors and propellers leads to extremely low signal-to-noise ratios at onboard microphones. Although recent masking-based deep neural network methods excel in monaural speech enhancement, they struggle in the challenging drone noise scenario. Furthermore, existing drone noise datasets are limited, causing models to overfit. Considering the harmonic nature of drone noise, this paper proposes a frequency domain bottleneck adapter to enable transfer learning. Specifically, the adapter's parameters are trained on drone noise while retaining the parameters of the pre-trained Frequency Recurrent Convolutional Recurrent Network (FRCRN) fixed. Evaluation results demonstrate the proposed method can effectively enhance speech quality. Moreover, it is a more efficient alternative to fine-tuning models for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Speech Recognition and Synthesis
MethodsAdapter
