Learning from Multiple Noisy Augmented Data Sets for Better   Cross-Lingual Spoken Language Understanding

Yingmei Guo; Linjun Shou; Jian Pei; Ming Gong; Mingxing Xu; and Zhiyong Wu; Daxin Jiang

arXiv:2109.01583·cs.CL·September 6, 2021

Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Yingmei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, and Zhiyong Wu, Daxin Jiang

PDF

Open Access

TL;DR

This paper introduces a denoising training method that leverages multiple models trained on various augmented datasets to improve cross-lingual spoken language understanding in low-resource languages, outperforming previous methods.

Contribution

The paper proposes a novel denoising training approach that uses mutual supervision among models trained on different augmented data to mitigate noise and enhance SLU performance.

Findings

01

Outperforms state-of-the-art by 3.05 and 4.24 percentage points

02

Effective noise mitigation in augmented datasets

03

Improves SLU accuracy in low-resource languages

Abstract

Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis