Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages
Jakub Hoscilowicz, Pawel Pawlowski, Marcin Skorupa, Marcin Sowa\'nski,, Artur Janicki

TL;DR
This paper presents a pipeline that leverages fine-tuned Large Language Models for translating and expanding Spoken Language Understanding systems to new languages, achieving significant accuracy improvements without altering existing architectures.
Contribution
The authors introduce a language expansion pipeline using LLMs for machine translation of SLU data, which improves accuracy and is slot-type independent, unlike prior methods.
Findings
Improved Overall Accuracy from 53% to 62.18% on MultiATIS++ in cloud scenarios.
Enhanced on-device accuracy from 5.31% to 22.06% with the proposed method.
Does not require changes in production SLU architecture or slot definitions.
Abstract
Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant. In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs) that we fine-tune for machine translation of slot-annotated SLU training data. Our approach improved on the MultiATIS++ benchmark, a primary multi-language SLU dataset, in the cloud scenario using an mBERT model. Specifically, we saw an improvement in the Overall Accuracy metric: from 53% to 62.18%, compared to the existing state-of-the-art method, Fine and Coarse-grained Multi-Task Learning Framework (FC-MTLF). In the on-device scenario (tiny and not pretrained SLU), our method improved the Overall Accuracy from 5.31% to 22.06% over the baseline Global-Local Contrastive Learning Framework (GL-CLeF) method. Contrary to both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
MethodsContrastive Learning · mBERT
