El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic   Parsing

Arash Einolghozati; Abhinav Arora; Lorena Sainz-Maza Lecanda; Anuj; Kumar; Sonal Gupta

arXiv:2101.10524·cs.CL·January 29, 2021

El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic Parsing

Arash Einolghozati, Abhinav Arora, Lorena Sainz-Maza Lecanda, Anuj, Kumar, Sonal Gupta

PDF

TL;DR

This paper introduces CSTOP, a dataset for Spanish-English code-switching semantic parsing, and proposes data augmentation methods to improve model performance in low-resource settings, significantly narrowing the accuracy gap.

Contribution

The work provides a new dataset for code-switching semantic parsing and novel data augmentation techniques to enhance model performance with limited data.

Findings

01

Pre-trained cross-lingual models perform well with limited data.

02

Data augmentation methods improve zero-shot and few-shot parsing accuracy.

03

Combining augmentation reduces the accuracy gap by two thirds.

Abstract

Being able to parse code-switched (CS) utterances, such as Spanish+English or Hindi+English, is essential to democratize task-oriented semantic parsing systems for certain locales. In this work, we focus on Spanglish (Spanish+English) and release a dataset, CSTOP, containing 5800 CS utterances alongside their semantic parses. We examine the CS generalizability of various Cross-lingual (XL) models and exhibit the advantage of pre-trained XL language models when data for only one language is present. As such, we focus on improving the pre-trained models for the case when only English corpus alongside either zero or a few CS training instances are available. We propose two data augmentation methods for the zero-shot and the few-shot settings: fine-tune using translate-and-align and augment using a generation model followed by match-and-filter. Combining the few-shot setting with the above…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.