DualVoice: Speech Interaction that Discriminates between Normal and   Whispered Voice Input

Jun Rekimoto

arXiv:2208.10499·cs.HC·August 24, 2022

DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input

Jun Rekimoto

PDF

TL;DR

DualVoice is a speech interaction system that distinguishes between normal and whispered voice inputs, enabling command input via whisper and text input via normal speech, improving accuracy and usability without extra hardware.

Contribution

This paper introduces a novel speech interaction method that discriminates between normal and whispered voices using neural networks, facilitating accurate command and text input without specialized hardware.

Findings

01

Successfully discriminates between normal and whispered speech.

02

Enables hands-free command and text input using only a standard microphone.

03

Prototype demonstrates practical application of the method.

Abstract

Interactions based on automatic speech recognition (ASR) have become widely used, with speech input being increasingly utilized to create documents. However, as there is no easy way to distinguish between commands being issued and text required to be input in speech, misrecognitions are difficult to identify and correct, meaning that documents need to be manually edited and corrected. The input of symbols and commands is also challenging because these may be misrecognized as text letters. To address these problems, this study proposes a speech interaction method called DualVoice, by which commands can be input in a whispered voice and letters in a normal voice. The proposed method does not require any specialized hardware other than a regular microphone, enabling a complete hands-free interaction. The method can be used in a wide range of situations where speech recognition is already…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.