Common Ground, Diverse Roots: The Difficulty of Classifying Common Examples in Spanish Varieties
Javier A. Lopetegui, Arij Riabi, Djam\'e Seddah

TL;DR
This paper investigates the challenge of classifying Spanish language varieties, especially common examples that overlap across varieties, and introduces a new dataset for Cuban Spanish to improve variety identification accuracy.
Contribution
It proposes a method using training dynamics and label confidence to detect common examples and errors, and introduces the first Cuban Spanish variety dataset with annotations.
Findings
Effective detection of common examples improves classification accuracy.
Predicted label confidence enhances model performance in variety identification.
First dataset for Cuban Spanish variety identification with annotations.
Abstract
Variations in languages across geographic regions or cultures are crucial to address to avoid biases in NLP systems designed for culturally sensitive tasks, such as hate speech detection or dialog with conversational agents. In languages such as Spanish, where varieties can significantly overlap, many examples can be valid across them, which we refer to as common examples. Ignoring these examples may cause misclassifications, reducing model accuracy and fairness. Therefore, accounting for these common examples is essential to improve the robustness and representativeness of NLP systems trained on such data. In this work, we address this problem in the context of Spanish varieties. We use training dynamics to automatically detect common examples or errors in existing Spanish datasets. We demonstrate the efficacy of using predicted label confidence for our Datamaps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices
