End-to-end Keyword Spotting using Xception-1d

Iv\'an Vall\'es-P\'erez; Juan G\'omez-Sanchis; Marcelino; Mart\'inez-Sober; Joan Vila-Franc\'es; Antonio J. Serrano-L\'opez; Emilio; Soria-Olivas

arXiv:2110.07498·cs.CL·October 15, 2021

End-to-end Keyword Spotting using Xception-1d

Iv\'an Vall\'es-P\'erez, Juan G\'omez-Sanchis, Marcelino, Mart\'inez-Sober, Joan Vila-Franc\'es, Antonio J. Serrano-L\'opez, Emilio, Soria-Olivas

PDF

Open Access 1 Repo

TL;DR

This paper presents an end-to-end keyword spotting system using an adapted Xception-1D model, achieving state-of-the-art accuracy of 96% across 35 categories, surpassing human performance.

Contribution

The work adapts the Xception architecture for audio keyword spotting, demonstrating its effectiveness and achieving superior accuracy in complex classification tasks.

Findings

01

Achieved 96% accuracy on 35-category keyword classification

02

Outperformed human annotation in complex tasks

03

Validated the effectiveness of Xception-1D for audio analysis

Abstract

The field of conversational agents is growing fast and there is an increasing need for algorithms that enhance natural interaction. In this work we show how we achieved state of the art results in the Keyword Spotting field by adapting and tweaking the Xception algorithm, which achieved outstanding results in several computer vision tasks. We obtained about 96\% accuracy when classifying audio clips belonging to 35 different categories, beating human annotation at the most complex tasks proposed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ivallesp/xception1d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsDepthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Average Pooling · 1x1 Convolution · Dense Connections · Max Pooling · Softmax · Global Average Pooling · Convolution