TL;DR
This paper introduces a lightweight end-to-end CNN that directly maps speech 2D features to prosthetic hand trajectories, enabling real-time control on embedded GPGPU devices without intermediate speech-to-text conversion.
Contribution
It presents a novel end-to-end CNN approach for speech-to-trajectory mapping in prosthetic hands, bypassing traditional speech recognition steps and optimized for embedded GPGPU hardware.
Findings
Achieved a root-mean-square error of 0.119 in trajectory prediction.
The CNN runs in 20ms on NVIDIA Jetson TX2, enabling real-time control.
The method is compatible with various speech 2D features like spectrogram, MFCC, or PNCC.
Abstract
Speech is one of the most common forms of communication in humans. Speech commands are essential parts of multimodal controlling of prosthetic hands. In the past decades, researchers used automatic speech recognition systems for controlling prosthetic hands by using speech commands. Automatic speech recognition systems learn how to map human speech to text. Then, they used natural language processing or a look-up table to map the estimated text to a trajectory. However, the performance of conventional speech-controlled prosthetic hands is still unsatisfactory. Recent advancements in general-purpose graphics processing units (GPGPUs) enable intelligent devices to run deep neural networks in real-time. Thus, architectures of intelligent systems have rapidly transformed from the paradigm of composite subsystems optimization to the paradigm of end-to-end optimization. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
