Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching Analysis
Luis Chiruzzo, Marvin Ag\"uero-Torales, Gustavo Gim\'enez-Lugo, Aldo, Alvarez, Yliana Rodr\'iguez, Santiago G\'ongora, Thamar Solorio

TL;DR
This paper introduces the first shared task for Guarani-Spanish code-switching analysis, focusing on language identification, NER, and usage classification, with annotated corpus and evaluation results from three teams.
Contribution
It presents a new benchmark dataset and tasks for Guarani-Spanish code-switching, enabling future research in this low-resource language pair.
Findings
Good results in language identification
Mixed results in NER and usage classification
Annotated corpus of 1500 texts with 25,000 tokens
Abstract
We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from news articles and tweets, around 25 thousand tokens, with the information for the tasks. Three teams took part in the evaluation phase, obtaining in general good results for Task 1, and more mixed results for Tasks 2 and 3.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Language and cultural evolution
