Implicit Discourse Relation Classification For Nigerian Pidgin

Muhammed Saeed; Peter Bourgonje; Vera Demberg

arXiv:2406.18776·cs.CL·November 5, 2024·1 cites

Implicit Discourse Relation Classification For Nigerian Pidgin

Muhammed Saeed, Peter Bourgonje, Vera Demberg

PDF

Open Access

TL;DR

This paper addresses implicit discourse relation classification for Nigerian Pidgin by comparing translation-based and synthetic corpus approaches, demonstrating that a native NP classifier significantly outperforms baseline methods.

Contribution

It introduces a novel synthetic corpus creation method for NP and shows that training a native classifier yields substantial performance improvements.

Findings

01

Synthetic NP corpus improves classification accuracy

02

Native NP classifier outperforms translation-based baseline

03

Significant F1 score gains in 4-way and 11-way classification

Abstract

Despite attempts to make Large Language Models multi-lingual, many of the world's languages are still severely under-resourced. This widens the performance gap between NLP and AI applications aimed at well-financed, and those aimed at less-resourced languages. In this paper, we focus on Nigerian Pidgin (NP), which is spoken by nearly 100 million people, but has comparatively very few NLP resources and corpora. We address the task of Implicit Discourse Relation Classification (IDRC) and systematically compare an approach translating NP data to English and then using a well-resourced IDRC tool and back-projecting the labels versus creating a synthetic discourse corpus for NP, in which we translate PDTB and project PDTB labels, and then train an NP IDR classifier. The latter approach of learning a "native" NP classifier outperforms our baseline by 13.27\% and 33.98\% in f $_{1}$ score for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus