Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings   in the Spanish Press

Elena \'Alvarez Mellado; Luis Espinosa Anke; Julio Gonzalo Arroyo,; Constantine Lignos; Jordi Porta Zamorano

arXiv:2110.15682·cs.CL·November 1, 2021

Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings in the Spanish Press

Elena \'Alvarez Mellado, Luis Espinosa Anke, Julio Gonzalo Arroyo,, Constantine Lignos, Jordi Porta Zamorano

PDF

Open Access

TL;DR

This paper reviews the ADoBo 2021 shared task on detecting English-origin lexical borrowings in Spanish news texts, highlighting the challenge and potential for NLP advancements.

Contribution

It presents the shared task setup, dataset, participant results, and insights into the difficulty of automatic borrowing detection in Spanish.

Findings

01

Results ranged from F1 scores of 37 to 85.

02

Detection is challenging with out-of-domain and OOV words.

03

Traditional lexicographic methods could improve with modern NLP techniques.

Abstract

This paper summarizes the main findings of the ADoBo 2021 shared task, proposed in the context of IberLef 2021. In this task, we invited participants to detect lexical borrowings (coming mostly from English) in Spanish newswire texts. This task was framed as a sequence classification problem using BIO encoding. We provided participants with an annotated corpus of lexical borrowings which we split into training, development and test splits. We received submissions from 4 teams with 9 different system runs overall. The results, which range from F1 scores of 37 to 85, suggest that this is a challenging task, especially when out-of-domain or OOV words are considered, and that traditional methods informed with lexicographic information would benefit from taking advantage of current NLP trends.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLinguistics, Language Diversity, and Identity · Lexicography and Language Studies · Swearing, Euphemism, Multilingualism

MethodsTest