Named Entity Recognition on Code-Switched Data: Overview of the CALCS   2018 Shared Task

Gustavo Aguilar; Fahad AlGhamdi; Victor Soto; Mona Diab; Julia; Hirschberg; Thamar Solorio

arXiv:1906.04138·cs.CL·June 11, 2019·1 cites

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task

Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia, Hirschberg, Thamar Solorio

PDF

Open Access

TL;DR

This paper presents an overview of the CALCS 2018 shared task on Named Entity Recognition for code-switched social media data, introducing new datasets and benchmarking NER performance across two language pairs.

Contribution

It introduces a new dataset for code-switched NER and provides benchmark results, highlighting challenges and participant approaches in this complex task.

Findings

01

Best scores of 63.76% and 71.61% for the two language pairs

02

Diverse entity types and social media noise increase task difficulty

03

Multiple participant approaches analyzed and discussed

Abstract

In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data. We divide the shared task into two competitions based on the English-Spanish (ENG-SPA) and Modern Standard Arabic-Egyptian (MSA-EGY) language pairs. We use Twitter data and 9 entity types to establish a new dataset for code-switched NER benchmarks. In addition to the CS phenomenon, the diversity of the entities and the social media challenges make the task considerably hard to process. As a result, the best scores of the competitions are 63.76% and 71.61% for ENG-SPA and MSA-EGY, respectively. We present the scores of 9 participants and discuss the most common challenges among submissions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification