Intent Classification Using Pre-trained Language Agnostic Embeddings For Low Resource Languages
Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W Black,, Rajiv Ratn Shah

TL;DR
This paper explores using pre-trained language-agnostic embeddings from an acoustic model to improve spoken intent classification in low-resource languages, demonstrating notable accuracy gains across multiple languages.
Contribution
It introduces a comparative study of three pre-trained acoustic embeddings for intent classification in low-resource languages, showing their effectiveness and scalability.
Findings
Improved intent classification accuracy for Sinhala and Tamil.
Competitive results achieved on English.
Performance scales positively with training data size.
Abstract
Building Spoken Language Understanding (SLU) systems that do not rely on language specific Automatic Speech Recognition (ASR) is an important yet less explored problem in language processing. In this paper, we present a comparative study aimed at employing a pre-trained acoustic model to perform SLU in low resource scenarios. Specifically, we use three different embeddings extracted using Allosaurus, a pre-trained universal phone decoder: (1) Phone (2) Panphone, and (3) Allo embeddings. These embeddings are then used in identifying the spoken intent. We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios. Our system improves on the state-of-the-art (SOTA) intent classification accuracy by approximately 2.11% for Sinhala and 7.00% for Tamil and achieves competitive results on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
