Comparative Analysis of N-gram Text Representation on Igbo Text Document   Similarity

Nkechi Ifeanyi-Reuben; Chidiebere Ugwu; Nwachukwu E.O

arXiv:2004.00375·cs.CL·August 5, 2020·1 cites

Comparative Analysis of N-gram Text Representation on Igbo Text Document Similarity

Nkechi Ifeanyi-Reuben, Chidiebere Ugwu, Nwachukwu E.O

PDF

Open Access

TL;DR

This study compares unigram and bigram text representations for Igbo document similarity, finding that bigram models provide more accurate similarity measures, which can improve Igbo text processing tasks.

Contribution

It introduces a comparative analysis of n-gram models for Igbo text similarity, highlighting the effectiveness of bigram representation over unigram.

Findings

01

Bigram models yield lower distance values indicating higher similarity.

02

Igbo text similarity is more accurate with bigram representation.

03

The study demonstrates the effectiveness of bigram models for Igbo text tasks.

Abstract

The improvement in Information Technology has encouraged the use of Igbo in the creation of text such as resources and news articles online. Text similarity is of great importance in any text-based applications. This paper presents a comparative analysis of n-gram text representation on Igbo text document similarity. It adopted Euclidean similarity measure to determine the similarities between Igbo text documents represented with two word-based n-gram text representation (unigram and bigram) models. The evaluation of the similarity measure is based on the adopted text representation models. The model is designed with Object-Oriented Methodology and implemented with Python programming language with tools from Natural Language Toolkits (NLTK). The result shows that unigram represented text has highest distance values whereas bigram has the lowest corresponding distance values. The lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems