SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization
Artem Dementyev, Dimitri Kanevsky, Samuel J. Yang, Mathieu Parvaix,, Chiong Lai, and Alex Olwal

TL;DR
SpeechCompass improves mobile speech-to-text by incorporating real-time speaker localization and directional guidance, enhancing group conversation clarity and user experience through novel algorithms and hardware.
Contribution
The paper introduces a real-time multi-microphone speech localization system with custom hardware and algorithms, enabling directional guidance in mobile captioning.
Findings
Participants valued directional guidance for group conversations.
Localization and diarization improved speech clarity and user understanding.
Hardware and algorithms achieved efficient real-time performance.
Abstract
Speech-to-text capabilities on mobile devices have proven helpful for hearing and speech accessibility, language translation, note-taking, and meeting transcripts. However, our foundational large-scale survey (n=263) shows that the inability to distinguish and indicate speaker direction makes them challenging in group conversations. SpeechCompass addresses this limitation through real-time, multi-microphone speech localization, where the direction of speech allows visual separation and guidance (e.g., arrows) in the user interface. We introduce efficient real-time audio localization algorithms and custom sound perception hardware running on a low-power microcontroller and four integrated microphones, which we characterize in technical evaluations. Informed by a large-scale survey (n=494), we conducted an in-person study of group conversations with eight frequent users of mobile…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Subtitles and Audiovisual Media · Speech Recognition and Synthesis
