Part of speech tagging for code switched data

Fahad AlGhamdi; Giovanni Molina; Mona Diab; Thamar Solorio; Abdelati; Hawwari; Victor Soto; Julia Hirschberg

arXiv:1909.13006·cs.CL·November 5, 2019

Part of speech tagging for code switched data

Fahad AlGhamdi, Giovanni Molina, Mona Diab, Thamar Solorio, Abdelati, Hawwari, Victor Soto, Julia Hirschberg

PDF

TL;DR

This paper investigates methods for effective Part of Speech tagging in code-switched data, comparing multiple strategies across Spanish-English and Arabic dialects, and finds that a machine learning approach with two POS taggers performs best.

Contribution

It introduces a machine learning framework combining two POS taggers for code-switched data, demonstrating improved accuracy over other methods.

Findings

01

Two POS taggers outperform single taggers in CS data

02

Unified CS-trained tagger shows competitive performance

03

Machine learning approach yields best results in experiments

Abstract

We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS). CS is the phenomenon where a speaker switches between two languages or variants of the same language within or across utterances, known as intra-sentential or inter-sentential CS, respectively. Processing CS data is especially challenging in intra-sentential data given state of the art monolingual NLP technology since such technology is geared toward the processing of one language at a time. In this paper we explore multiple strategies of applying state of the art POS taggers to CS data. We investigate the landscape in two CS language pairs, Spanish-English and Modern Standard Arabic-Arabic dialects. We compare the use of two POS taggers vs. a unified tagger trained on CS data. Our results show that applying a machine learning framework using two state of the art POS taggers achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.