SpeechCompass: Enhancing Mobile Captioning with Diarization and   Directional Guidance via Multi-Microphone Localization

Artem Dementyev; Dimitri Kanevsky; Samuel J. Yang; Mathieu Parvaix,; Chiong Lai; and Alex Olwal

arXiv:2502.08848·cs.HC·March 6, 2025

SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization

Artem Dementyev, Dimitri Kanevsky, Samuel J. Yang, Mathieu Parvaix,, Chiong Lai, and Alex Olwal

PDF

Open Access

TL;DR

SpeechCompass improves mobile speech-to-text by incorporating real-time speaker localization and directional guidance, enhancing group conversation clarity and user experience through novel algorithms and hardware.

Contribution

The paper introduces a real-time multi-microphone speech localization system with custom hardware and algorithms, enabling directional guidance in mobile captioning.

Findings

01

Participants valued directional guidance for group conversations.

02

Localization and diarization improved speech clarity and user understanding.

03

Hardware and algorithms achieved efficient real-time performance.

Abstract

Speech-to-text capabilities on mobile devices have proven helpful for hearing and speech accessibility, language translation, note-taking, and meeting transcripts. However, our foundational large-scale survey (n=263) shows that the inability to distinguish and indicate speaker direction makes them challenging in group conversations. SpeechCompass addresses this limitation through real-time, multi-microphone speech localization, where the direction of speech allows visual separation and guidance (e.g., arrows) in the user interface. We introduce efficient real-time audio localization algorithms and custom sound perception hardware running on a low-power microcontroller and four integrated microphones, which we characterize in technical evaluations. Informed by a large-scale survey (n=494), we conducted an in-person study of group conversations with eight frequent users of mobile…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Subtitles and Audiovisual Media · Speech Recognition and Synthesis