Identifying the Desired Word Suggestion in Simultaneous Audio
Dylan Gaines, Keith Vertanen

TL;DR
This study investigates how simultaneous voice presentations affect user ability to select desired words during non-visual text input, finding that slight delays can improve accuracy and speed.
Contribution
It introduces a method to present multiple word suggestions via simultaneous voices with minimal accuracy loss by adding slight delays.
Findings
Adding a 0.15 s delay improves accuracy for simultaneous words.
Simultaneous presentation with delay is 32% faster than sequential.
User accuracy remains high with two words when delay is used.
Abstract
We explore a method for presenting word suggestions for non-visual text input using simultaneous voices. We conduct two perceptual studies and investigate the impact of different presentations of voices on a user's ability to detect which voice, if any, spoke their desired word. Our sets of words simulated the word suggestions of a predictive keyboard during real-world text input. We find that when voices are simultaneous, user accuracy decreases significantly with each added word suggestion. However, adding a slight 0.15 s delay between the start of each subsequent word allows two simultaneous words to be presented with no significant decrease in accuracy compared to presenting two words sequentially (84% simultaneous versus 86% sequential). This allows two word suggestions to be presented to the user 32% faster than sequential playback without decreasing accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
