Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time
Yue Li, Koen V. Hindriks, Florian A. Kunneman

TL;DR
This paper introduces a novel two-mask Conformer-based GAN model for speech enhancement in robot ego speech filtering, addressing oversubtraction issues and enabling semi-real-time processing to improve speech recognition in noisy human-robot interaction environments.
Contribution
It proposes a new two-mask Conformer GAN model and an incremental processing method for semi-real-time speech enhancement in robot ego speech filtering.
Findings
Significant improvement in speech recognition accuracy.
Effective compensation for oversubtracted fundamental frequency range.
Robust performance in unseen noise conditions.
Abstract
Spectral subtraction, widely used for its simplicity, has been employed to address the Robot Ego Speech Filtering (RESF) problem for detecting speech contents of human interruption from robot's single-channel microphone recordings when it is speaking. However, this approach suffers from oversubtraction in the fundamental frequency range (FFR), leading to degraded speech content recognition. To address this, we propose a Two-Mask Conformer-based Metric Generative Adversarial Network (CMGAN) to enhance the detected speech and improve recognition results. Our model compensates for oversubtracted FFR values with high-frequency information and long-term features and then de-noises the new spectrogram. In addition, we introduce an incremental processing method that allows semi-real-time audio processing with streaming input on a network trained on long fixed-length input. Evaluations of two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Social Robot Interaction and HRI
