Augmenting text for spoken language understanding with Large Language   Models

Roshan Sharma; Suyoun Kim; Daniel Lazar; Trang Le; Akshat Shrivastava,; Kwanghoon Ahn; Piyush Kansal; Leda Sari; Ozlem Kalinli; Michael Seltzer

arXiv:2309.09390·cs.CL·September 19, 2023

Augmenting text for spoken language understanding with Large Language Models

Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava,, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer

PDF

Open Access

TL;DR

This paper explores methods to improve spoken semantic parsing by augmenting training data with unpaired text, generated either from existing corpora or via Large Language Models, leading to significant performance gains.

Contribution

It introduces a novel approach of using LLMs to generate unpaired text for data augmentation in spoken language understanding, enhancing model performance across domains.

Findings

01

Unpaired text from existing corpora improves performance by 2% in existing domains.

02

Generated unpaired text from LLMs improves performance by 2.6% in new domains.

03

Using JAT and TTS with generated text enhances spoken semantic parsing accuracy.

Abstract

Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of speech-transcript-semantic parse data, which is expensive to obtain. In this paper, we address this challenge by examining methods that can use transcript-semantic parse data (unpaired text) without corresponding speech. First, when unpaired text is drawn from existing textual corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways to generate speech representations for unpaired text. Experiments on the STOP dataset show that unpaired text from existing and new domains improves performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, we consider the setting when unpaired text is not available in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis