A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for   Speech Interruption During Human-Robot Interaction

Yue Li; Florian A. Kunneman; Koen V. Hindriks

arXiv:2405.13477·cs.HC·May 24, 2024

A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction

Yue Li, Florian A. Kunneman, Koen V. Hindriks

PDF

Open Access

TL;DR

This paper presents a real-time audio processing pipeline that filters out robot's own speech, enabling natural human-robot interactions with interruptions, using a neural network and spectral subtraction on single-channel microphone data.

Contribution

The authors develop and evaluate a novel near-real-time speech filtering pipeline that outperforms previous methods and integrates into a robot framework for improved human-robot interaction.

Findings

01

Pipeline outperforms previous approaches and state-of-the-art systems.

02

Effective in extracting human speech during robot interaction.

03

Demonstrated feasibility in a real-world robot setup.

Abstract

With current state-of-the-art automatic speech recognition (ASR) systems, it is not possible to transcribe overlapping speech audio streams separately. Consequently, when these ASR systems are used as part of a social robot like Pepper for interaction with a human, it is common practice to close the robot's microphone while it is talking itself. This prevents the human users to interrupt the robot, which limits speech-based human-robot interaction. To enable a more natural interaction which allows for such interruptions, we propose an audio processing pipeline for filtering out robot's ego speech using only a single-channel microphone. This pipeline takes advantage of the possibility to feed the robot ego speech signal, generated by a text-to-speech API, as training data into a machine learning model. The proposed pipeline combines a convolutional neural network and spectral subtraction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Speech and dialogue systems · Social Robot Interaction and HRI