Modeling Orthographic Variation Improves NLP Performance for Nigerian   Pidgin

Pin-Jie Lin; Merel Scholman; Muhammed Saeed; Vera Demberg

arXiv:2404.18264·cs.CL·April 30, 2024·1 cites

Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin

Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg

PDF

Open Access

TL;DR

This paper introduces a phonetic-theoretic framework to model orthographic variations in Nigerian Pidgin, improving NLP tasks like translation and sentiment analysis by augmenting training data with relevant orthographic variants.

Contribution

It is the first to systematically model and generate orthographic variations in Nigerian Pidgin, enhancing NLP performance through data augmentation.

Findings

01

Performance improved by 2.1 points in sentiment analysis.

02

Translation BLEU score increased by 1.4 points.

03

Orthographic variation modeling benefits NLP tasks.

Abstract

Nigerian Pidgin is an English-derived contact language and is traditionally an oral language, spoken by approximately 100 million people. No orthographic standard has yet been adopted, and thus the few available Pidgin datasets that exist are characterised by noise in the form of orthographic variations. This contributes to under-performance of models in critical NLP tasks. The current work is the first to describe various types of orthographic variations commonly found in Nigerian Pidgin texts, and model this orthographic variation. The variations identified in the dataset form the basis of a phonetic-theoretic framework for word editing, which is used to generate orthographic variations to augment training data. We test the effect of this data augmentation on two critical NLP tasks: machine translation and sentiment analysis. The proposed variation generation framework augments the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReligion and Sociopolitical Dynamics in Nigeria · African history and culture analysis

MethodsSparse Evolutionary Training