Analysis and representation of Igbo text document for a text-based system
Ifeanyi-Reuben Nkechi J., Ugwu Chidiebere, Adegbola Tunde

TL;DR
This paper analyzes Igbo language text documents, highlighting the challenges of compounding and word order, and proposes using Bigram and Trigram N-gram models to improve text representation for better performance in text-based applications.
Contribution
It introduces an N-gram based representation tailored for Igbo language, addressing its unique linguistic features like compounding and collocations.
Findings
Bigram and Trigram models capture more semantic information.
N-gram models address Igbo language peculiarities effectively.
Improved text representation enhances performance in Igbo language applications.
Abstract
The advancement in Information Technology (IT) has assisted in inculcating the three Nigeria major languages in text-based application such as text mining, information retrieval and natural language processing. The interest of this paper is the Igbo language, which uses compounding as a common type of word formation and as well has many vocabularies of compound words. The issues of collocation, word ordering and compounding play high role in Igbo language. The ambiguity in dealing with these compound words has made the representation of Igbo language text document very difficult because this cannot be addressed using the most common and standard approach of the Bag-Of-Words (BOW) model of text representation, which ignores the word order and relation. However, this cause for a concern and the need to develop an improved model to capture this situation. This paper presents the analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
