Learning Efficient Representations for Keyword Spotting with Triplet   Loss

Roman Vygon; Nikolay Mikhaylovskiy

arXiv:2101.04792·eess.AS·February 8, 2022

Learning Efficient Representations for Keyword Spotting with Triplet Loss

Roman Vygon, Nikolay Mikhaylovskiy

PDF

1 Repo

TL;DR

This paper demonstrates that combining triplet loss-based embeddings with a kNN classifier significantly improves speech keyword spotting accuracy, achieving state-of-the-art results on multiple datasets.

Contribution

It introduces a novel phonetic similarity triplet mining method and shows that this combination outperforms traditional classification techniques in speech recognition tasks.

Findings

01

26% to 38% improvement in classification accuracy

02

Achieved 98.55% accuracy on Google Speech Commands V1

03

Achieved 97.0% accuracy on Google Speech Commands V2 35-class

Abstract

In the past few years, triplet loss-based metric embeddings have become a de-facto standard for several important computer vision problems, most no-tably, person reidentification. On the other hand, in the area of speech recognition the metric embeddings generated by the triplet loss are rarely used even for classification problems. We fill this gap showing that a combination of two representation learning techniques: a triplet loss-based embedding and a variant of kNN for classification instead of cross-entropy loss significantly (by 26% to 38%) improves the classification accuracy for convolutional networks on a LibriSpeech-derived LibriWords datasets. To do so, we propose a novel phonetic similarity based triplet mining approach. We also improve the current best published SOTA for Google Speech Commands dataset V1 10+2 -class classification by about 34%, achieving 98.55% accuracy, V2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

roman-vygon/triplet_loss_kws
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.