A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
Yue Li, Florian A. Kunneman, Koen V. Hindriks

TL;DR
This paper presents a real-time audio processing pipeline that filters out robot's own speech, enabling natural human-robot interactions with interruptions, using a neural network and spectral subtraction on single-channel microphone data.
Contribution
The authors develop and evaluate a novel near-real-time speech filtering pipeline that outperforms previous methods and integrates into a robot framework for improved human-robot interaction.
Findings
Pipeline outperforms previous approaches and state-of-the-art systems.
Effective in extracting human speech during robot interaction.
Demonstrated feasibility in a real-world robot setup.
Abstract
With current state-of-the-art automatic speech recognition (ASR) systems, it is not possible to transcribe overlapping speech audio streams separately. Consequently, when these ASR systems are used as part of a social robot like Pepper for interaction with a human, it is common practice to close the robot's microphone while it is talking itself. This prevents the human users to interrupt the robot, which limits speech-based human-robot interaction. To enable a more natural interaction which allows for such interruptions, we propose an audio processing pipeline for filtering out robot's ego speech using only a single-channel microphone. This pipeline takes advantage of the possibility to feed the robot ego speech signal, generated by a text-to-speech API, as training data into a machine learning model. The proposed pipeline combines a convolutional neural network and spectral subtraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Speech and dialogue systems · Social Robot Interaction and HRI
