Universal Dependency Parsing for Hindi-English Code-switching

Irshad Ahmad Bhat; Riyaz Ahmad Bhat; Manish Shrivastava; Dipti; Misra Sharma

arXiv:1804.05868·cs.CL·April 25, 2018

Universal Dependency Parsing for Hindi-English Code-switching

Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava, Dipti, Misra Sharma

PDF

2 Repos

TL;DR

This paper develops a neural dependency parser for Hindi-English code-switching Twitter data, creating a new treebank, and introduces normalization and back-transliteration models to improve parsing accuracy.

Contribution

It presents a novel Hindi-English code-switching treebank under Universal Dependencies and a neural stacking parser that leverages multilingual syntactic information.

Findings

01

Neural stacking parser outperforms augmented models by 1.5% LAS.

02

Decoding process improves LAS by 3.8% over baseline normalization.

03

Created a new code-switching treebank for Hindi-English Twitter data.

Abstract

Code-switching is a phenomenon of mixing grammatical structures of two or more languages under varied social constraints. The code-switching data differ so radically from the benchmark corpora used in NLP community that the application of standard technologies to these data degrades their performance sharply. Unlike standard corpora, these data often need to go through additional processes such as language identification, normalization and/or back-transliteration for their efficient processing. In this paper, we investigate these indispensable processes and other problems associated with syntactic parsing of code-switching data and propose methods to mitigate their effects. In particular, we study dependency parsing of code-switching data of Hindi and English multilingual speakers from Twitter. We present a treebank of Hindi-English code-switching tweets under Universal Dependencies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.