PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting

Jianan Pan; Kejie Huang

arXiv:2603.18023·eess.AS·March 20, 2026

PCOV-KWS: Multi-task Learning for Personalized Customizable Open Vocabulary Keyword Spotting

Jianan Pan, Kejie Huang

PDF

Open Access

TL;DR

This paper presents PCOV-KWS, a multi-task learning framework for personalized open-vocabulary keyword spotting that improves accuracy and efficiency by combining keyword detection and speaker verification.

Contribution

It introduces a lightweight multi-task learning approach with a novel training criterion that enhances personalized keyword spotting performance while reducing computational requirements.

Findings

01

Outperforms baseline models in accuracy

02

Requires fewer parameters and less computation

03

Effective across multiple datasets

Abstract

As advancements in technologies like Internet of Things (IoT), Automatic Speech Recognition (ASR), Speaker Verification (SV), and Text-to-Speech (TTS) lead to increased usage of intelligent voice assistants, the demand for privacy and personalization has escalated. In this paper, we introduce a multi-task learning framework for personalized, customizable open-vocabulary Keyword Spotting (PCOV-KWS). This framework employs a lightweight network to simultaneously perform Keyword Spotting (KWS) and SV to address personalized KWS requirements. We have integrated a training criterion distinct from softmax-based loss, transforming multi-class classification into multiple binary classifications, which eliminates inter-category competition, while an optimization strategy for multi-task loss weighting is employed during training. We evaluated our PCOV-KWS system in multiple datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · AI in Service Interactions