Robust Recognition of Simultaneous Speech By a Mobile Robot
Jean-Marc Valin, Shun'ichi Yamamoto, Jean Rouat, Francois Michaud,, Kazuhiro Nakadai, Hiroshi G. Okuno

TL;DR
This paper presents a robust speech recognition system for a mobile robot that effectively separates and recognizes simultaneous speech from multiple speakers using microphone arrays, source separation, and missing feature techniques.
Contribution
The paper introduces a combined approach of Geometric Source Separation, post-filtering, and missing feature theory to improve recognition of simultaneous speech in real-time robotic applications.
Findings
Achieved a 24% reduction in recognition error with the post-filter.
Achieved a 42% reduction when combining missing features with the post-filter.
Demonstrated effective recognition of Japanese speech from multiple speakers at various angles.
Abstract
This paper describes a system that gives a mobile robot the ability to perform automatic speech recognition with simultaneous speakers. A microphone array is used along with a real-time implementation of Geometric Source Separation and a post-filter that gives a further reduction of interference from other sources. The post-filter is also used to estimate the reliability of spectral features and compute a missing feature mask. The mask is used in a missing feature theory-based speech recognition system to recognize the speech from simultaneous Japanese speakers in the context of a humanoid robot. Recognition rates are presented for three simultaneous speakers located at 2 meters from the robot. The system was evaluated on a 200 word vocabulary at different azimuths between sources, ranging from 10 to 90 degrees. Compared to the use of the microphone array source separation alone, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
