Egonoise Resilient Source Localization and Speech Enhancement for Drones Using a Hybrid Model and Learning-Based Approach

Yihsuan Wu; Yukai Chiu; Michael Anthony; and Mingsian R. Bai

arXiv:2508.06310·eess.AS·August 11, 2025

Egonoise Resilient Source Localization and Speech Enhancement for Drones Using a Hybrid Model and Learning-Based Approach

Yihsuan Wu, Yukai Chiu, Michael Anthony, and Mingsian R. Bai

PDF

Open Access

TL;DR

This paper introduces a hybrid ASP and DNN-based system for drone microphone arrays that effectively localizes and enhances speech in extremely noisy environments, improving drone audition capabilities.

Contribution

It presents a novel hybrid approach combining array signal processing and deep learning for robust speech localization and enhancement on drones, addressing egonoise challenges.

Findings

01

Outperforms four baseline methods in low SNR conditions as low as -30 dB.

02

Uses a six-microphone circular array for improved spatial audio capture.

03

Validated on DREGON dataset and real measurements.

Abstract

Drones are becoming increasingly important in search and rescue missions, and even military operations. While the majority of drones are equipped with camera vision capabilities, the realm of drone audition remains underexplored due to the inherent challenge of mitigating the egonoise generated by the rotors. In this paper, we present a novel technique to address this extremely low signal-to-noise ratio (SNR) problem encountered by the microphone-embedded drones. The technique is implemented using a hybrid approach that combines Array Signal Processing (ASP) and Deep Neural Networks (DNN) to enhance the speech signals captured by a six-microphone uniform circular array mounted on a quadcopter. The system performs localization of the target speaker through beamsteering in conjunction with speech enhancement through a Generalized Sidelobe Canceller-DeepFilterNet 2 (GSC-DF2) system. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis