Egonoise Resilient Source Localization and Speech Enhancement for Drones Using a Hybrid Model and Learning-Based Approach
Yihsuan Wu, Yukai Chiu, Michael Anthony, and Mingsian R. Bai

TL;DR
This paper introduces a hybrid ASP and DNN-based system for drone microphone arrays that effectively localizes and enhances speech in extremely noisy environments, improving drone audition capabilities.
Contribution
It presents a novel hybrid approach combining array signal processing and deep learning for robust speech localization and enhancement on drones, addressing egonoise challenges.
Findings
Outperforms four baseline methods in low SNR conditions as low as -30 dB.
Uses a six-microphone circular array for improved spatial audio capture.
Validated on DREGON dataset and real measurements.
Abstract
Drones are becoming increasingly important in search and rescue missions, and even military operations. While the majority of drones are equipped with camera vision capabilities, the realm of drone audition remains underexplored due to the inherent challenge of mitigating the egonoise generated by the rotors. In this paper, we present a novel technique to address this extremely low signal-to-noise ratio (SNR) problem encountered by the microphone-embedded drones. The technique is implemented using a hybrid approach that combines Array Signal Processing (ASP) and Deep Neural Networks (DNN) to enhance the speech signals captured by a six-microphone uniform circular array mounted on a quadcopter. The system performs localization of the target speaker through beamsteering in conjunction with speech enhancement through a Generalized Sidelobe Canceller-DeepFilterNet 2 (GSC-DF2) system. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis
