AfroScope: A Framework for Studying the Linguistic Landscape of Africa
Sang Yun Kwon, AbdelRahim Elmadany, Muhammad Abdul-Mageed

TL;DR
AfroScope introduces a comprehensive framework for African language identification, including an extensive dataset and models, with hierarchical classification to improve distinctions among closely related languages, supporting large-scale linguistic analysis.
Contribution
The paper presents AfroScope, a unified framework with a large African language dataset and models, and a hierarchical classification approach for better differentiation of similar languages.
Findings
Hierarchical classification improves macro F1 by 4.55 on confusable languages
AfroScope covers 713 African languages with strong LID models
Analysis of cross-linguistic transfer informs robust system development
Abstract
Language Identification (LID) is the task of determining the language of a given text and is a fundamental preprocessing step that affects the reliability of downstream NLP applications. While recent work has expanded LID coverage for African languages, existing approaches remain limited in (i) the number of supported languages and (ii) their ability to make fine-grained distinctions among closely related varieties. We introduce AfroScope, a unified framework for African LID that includes AfroScope-Data, a dataset covering 713 African languages, and AfroScope-Models, a suite of strong LID models with broad language coverage. To better distinguish highly confusable languages, we propose a hierarchical classification approach that leverages Mirror-Serengeti, a specialized embedding model targeting 29 closely related or geographically proximate languages. This approach improves macro F1 by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Language and cultural evolution · Linguistic Variation and Morphology
