Acoustics Based Intent Recognition Using Discovered Phonetic Units for   Low Resource Languages

Akshat Gupta; Xinjian Li; Sai Krishna Rallabandi; Alan W Black

arXiv:2011.03646·cs.CL·February 23, 2021

Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W Black

PDF

TL;DR

This paper introduces a novel acoustics-based intent recognition system for low-resource languages, utilizing discovered phonetic units and a CNN+LSTM architecture to improve cross-lingual transfer and zero-shot performance.

Contribution

It presents a universal phonetic recognition and intent classification system that enhances spoken dialog capabilities in low-resource languages through multilingual training.

Findings

01

Effective intent recognition in Indic and Romance languages.

02

Improved cross-lingual transfer and zero-shot performance.

03

Demonstrated benefits of multilingual training for low-resource languages.

Abstract

With recent advancements in language technologies, humans are now speaking to devices. Increasing the reach of spoken language technologies requires building systems in local languages. A major bottleneck here are the underlying data-intensive parts that make up such systems, including automatic speech recognition (ASR) systems that require large amounts of labelled data. With the aim of aiding development of spoken dialog systems in low resourced languages, we propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification. The system is made up of two blocks - the first block is a universal phone recognition system that generates a transcript of discovered phonetic units for the input audio, and the second block performs intent classification from the generated phonetic transcripts. We propose a CNN+LSTM based architecture and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.