Guylingo: The Republic of Guyana Creole Corpora
Christopher Clarke, Roland Daynauth, Charlene Wilkinson, Hubert, Devonish, Jason Mars

TL;DR
This paper introduces Guylingo, a new corpus for Guyanese Creole, addressing the lack of computational resources for low-resource languages and exploring NLP challenges and opportunities in Creole language processing.
Contribution
The paper presents a comprehensive corpus for Guyanese Creole and discusses methods for data collection, NLP challenges, and opportunities for language policy advancement.
Findings
Created a diverse Guylingo corpus including colloquialisms and regional variations
Identified key challenges in machine translation for Creole languages
Highlighted potential for NLP to support official language recognition
Abstract
While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support. One such region is the Caribbean. While commonly labeled as "English speaking", the ex-British Caribbean region consists of a myriad of Creole languages thriving alongside English. In this paper, we present Guylingo: a comprehensive corpus designed for advancing NLP research in the domain of Creolese (Guyanese English-lexicon Creole), the most widely spoken language in the culturally rich nation of Guyana. We first outline our framework for gathering and digitizing this diverse corpus, inclusive of colloquial expressions, idioms, and regional variations in a low-resource language. We then demonstrate the challenges of training and evaluating NLP models for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCaribbean history, culture, and politics · Linguistic Variation and Morphology · Migration, Identity, and Health
