Efficient Keyword Spotting by capturing long-range interactions with   Temporal Lambda Networks

Biel Tura; Santiago Escuder; Ferran Diego; Carlos Segura; Jordi Luque

arXiv:2104.08086·eess.AS·July 2, 2021·ASRU

Efficient Keyword Spotting by capturing long-range interactions with Temporal Lambda Networks

Biel Tura, Santiago Escuder, Ferran Diego, Carlos Segura, Jordi Luque

PDF

1 Repo

TL;DR

This paper introduces a novel ResNet-based keyword spotting model using temporal Lambda networks, achieving state-of-the-art accuracy with significantly reduced complexity and faster inference compared to attention-based models.

Contribution

It pioneers the application of Lambda networks in speech, creating a lightweight, efficient architecture that outperforms existing models in accuracy and speed.

Findings

01

Achieves state-of-the-art accuracy on Google Speech Commands dataset.

02

Reduces model size by up to 85% compared to Transformer-based models.

03

Increases inference speed by up to 100 times.

Abstract

Models based on attention mechanisms have shown unprecedented speech recognition performance. However, they are computationally expensive and unnecessarily complex for keyword spotting, a task targeted to small-footprint devices. This work explores the application of Lambda networks, an alternative framework for capturing long-range interactions without attention, for the keyword spotting task. We propose a novel \textit{ResNet}-based model by swapping the residual blocks by temporal Lambda layers. Furthermore, the proposed architecture is built upon uni-dimensional temporal convolutions that further reduce its complexity. The presented model does not only reach state-of-the-art accuracies on the Google Speech Commands dataset, but it is 85% and 65% lighter than its Transformer-based (KWT) and convolutional (Res15) counterparts while being up to 100 times faster. To the best of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Telefonica/LambdaNetwork
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.