Directional Source Separation for Robust Speech Recognition on Smart Glasses

Tiantian Feng; Ju Lin; Yiteng Huang; Weipeng He; Kaustubh Kalgaonkar; Niko Moritz; Li Wan; Xin Lei; Ming Sun; Frank Seide

arXiv:2309.10993·cs.SD·June 17, 2025·2 cites

Directional Source Separation for Robust Speech Recognition on Smart Glasses

Tiantian Feng, Ju Lin, Yiteng Huang, Weipeng He, Kaustubh Kalgaonkar, Niko Moritz, Li Wan, Xin Lei, Ming Sun, Frank Seide

PDF

Open Access

TL;DR

This paper explores directional source separation techniques, including neural beamforming, to enhance speech recognition accuracy on smart glasses in noisy environments, demonstrating improved ASR performance for the user.

Contribution

It introduces neural beamforming for source separation and joint training with ASR, advancing noise robustness in smart glasses speech recognition systems.

Findings

01

Directional source separation improves ASR for the wearer.

02

Neural beamforming effectively learns directional characteristics.

03

Joint training yields the best overall ASR performance.

Abstract

Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality, this work investigates directional source separation using the multi-microphone array. We first explore multiple beamformers to assist source separation modeling by strengthening the directional properties of speech signals. In addition to relying on predetermined beamformers, we investigate neural beamforming in multi-channel source separation, demonstrating that automatic learning directional characteristics effectively improves separation quality. We further compare the ASR performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Music and Audio Processing