# Joining Hands: Exploiting Monolingual Treebanks for Parsing of   Code-mixing Data

**Authors:** Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava, Dipti, Misra Sharma

arXiv: 1703.10772 · 2017-04-03

## TL;DR

This paper introduces resource-efficient methods for parsing code-mixed language data by leveraging existing monolingual treebanks, demonstrating improved results and providing a new annotated dataset of Hindi-English tweets.

## Contribution

It presents novel strategies that utilize monolingual resources for code-mixed parsing, reducing reliance on in-domain annotations and improving performance.

## Key findings

- Methods outperform baseline models
- New dataset of 450 annotated code-mixed tweets
- Effective use of monolingual treebanks for parsing

## Abstract

In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data. These strategies are not constrained by in-domain annotations, rather they leverage pre-existing monolingual annotated resources for training. We show that these methods can produce significantly better results as compared to an informed baseline. Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.10772/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1703.10772/full.md

---
Source: https://tomesphere.com/paper/1703.10772