Augmenting Bottleneck Features of Deep Neural Network Employing Motor   State for Speech Recognition at Humanoid Robots

Moa Lee; Joon Hyuk Chang

arXiv:1808.08702·cs.SD·August 28, 2018

Augmenting Bottleneck Features of Deep Neural Network Employing Motor State for Speech Recognition at Humanoid Robots

Moa Lee, Joon Hyuk Chang

PDF

Open Access

TL;DR

This paper introduces a noise-robust speech recognition system for humanoid robots that uses motor state information to enhance bottleneck features in DNNs, significantly improving phoneme recognition accuracy.

Contribution

It proposes a novel method integrating motor on/off states into bottleneck feature extraction for DNN-based speech recognition in noisy robotic environments.

Findings

01

Achieves 11% relative improvement in phoneme error rate on TIMIT.

02

Demonstrates robustness of the proposed system against ego-noise in humanoid robots.

Abstract

As for the humanoid robots, the internal noise, which is generated by motors, fans and mechanical components when the robot is moving or shaking its body, severely degrades the performance of the speech recognition accuracy. In this paper, a novel speech recognition system robust to ego-noise for humanoid robots is proposed, in which on/off state of the motor is employed as auxiliary information for finding the relevant input features. For this, we consider the bottleneck features, which have been successfully applied to deep neural network (DNN) based automatic speech recognition (ASR) system. When learning the bottleneck features to catch, we first exploit the motor on/off state data as supplementary information in addition to the acoustic features as the input of the first deep neural network (DNN) for preliminary acoustic modeling. Then, the second DNN for primary acoustic modeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing