Personalizing Keyword Spotting with Speaker Information

Beltr\'an Labrador; Pai Zhu; Guanlong Zhao; Angelo Scorza Scarpati,; Quan Wang; Alicia Lozano-Diez; Alex Park; Ignacio L\'opez Moreno

arXiv:2311.03419·eess.AS·November 8, 2023·2 cites

Personalizing Keyword Spotting with Speaker Information

Beltr\'an Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati,, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio L\'opez Moreno

PDF

Open Access

TL;DR

This paper introduces a method to improve keyword spotting accuracy across diverse speakers by integrating speaker information via FiLM, with minimal additional computational cost.

Contribution

We propose a novel FiLM-based approach that incorporates speaker information into keyword spotting, enhancing performance for underrepresented groups with minimal parameter increase.

Findings

01

Significant improvement in keyword detection accuracy for diverse speakers.

02

Effective integration of speaker info from both input and enrolled audio.

03

Minimal impact on latency and computational cost.

Abstract

Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker recognition systems to extract speaker information, and we experiment on extracting this information from both the input audio and pre-enrolled user audio. We evaluate our systems on a diverse dataset and achieve a substantial improvement in keyword detection accuracy, particularly among underrepresented speaker groups. Moreover, our proposed approach only requires a small 1% increase in the number of parameters, with a minimum impact on latency and computational cost, which makes it a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing