Unsupervised Spoken Utterance Classification

Shahab Jalalvand; Srinivas Bangalore

arXiv:2107.01068·cs.CL·July 5, 2021

Unsupervised Spoken Utterance Classification

Shahab Jalalvand, Srinivas Bangalore

PDF

Open Access

TL;DR

This paper introduces an unsupervised method for spoken utterance classification that reduces the need for labeled data and improves processing speed, making it suitable for call routing in virtual assistants.

Contribution

The paper presents a novel unsupervised approach using a KNN classifier and embedding models, notably Elmo, with a lookup table for efficient runtime processing.

Findings

01

Outperforms traditional methods with a 27.0% error rate

02

Reduces processing time from 16 to 118 utterances/sec

03

Requires minimal labeled data, only intent labels and para-phrases

Abstract

An intelligent virtual assistant (IVA) enables effortless conversations in call routing through spoken utterance classification (SUC) which is a special form of spoken language understanding (SLU). Building a SUC system requires a large amount of supervised in-domain data that is not always available. In this paper, we introduce an unsupervised spoken utterance classification approach (USUC) that does not require any in-domain data except for the intent labels and a few para-phrases per intent. USUC is consisting of a KNN classifier (K=1) and a complex embedding model trained on a large amount of unsupervised customer service corpus. Among all embedding models, we demonstrate that Elmo works best for USUC. However, an Elmo model is too slow to be used at run-time for call routing. To resolve this issue, first, we compute the uni- and bi-gram embedding vectors offline and we build a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

Methodstravel james · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo