Amharic Text Clustering Using Encyclopedic Knowledge with Neural Word   Embedding

Dessalew Yohannes; Yeregal Assabie

arXiv:2105.00809·cs.CL·September 23, 2022·AfricaNLP·1 cites

Amharic Text Clustering Using Encyclopedic Knowledge with Neural Word Embedding

Dessalew Yohannes, Yeregal Assabie

PDF

Open Access

TL;DR

This paper presents a novel approach for clustering Amharic text documents by integrating Encyclopedic Knowledge with neural word embeddings, enhancing clustering accuracy over traditional methods.

Contribution

The study introduces a combined EK and neural embedding method for Amharic text clustering, demonstrating improved accuracy and analyzing the impact of class size on results.

Findings

01

EK with word embedding improves clustering accuracy

02

Changing class size significantly affects accuracy

03

System tested on Amharic Wikipedia data

Abstract

In this digital era, almost in every discipline people are using automated systems that generate information represented in document format in different natural languages. As a result, there is a growing interest towards better solutions for finding, organizing and analyzing these documents. In this paper, we propose a system that clusters Amharic text documents using Encyclopedic Knowledge (EK) with neural word embedding. EK enables the representation of related concepts and neural word embedding allows us to handle the contexts of the relatedness. During the clustering process, all the text documents pass through preprocessing stages. Enriched text document features are extracted from each document by mapping with EK and word embedding model. TF-IDF weighted vector of enriched feature was generated. Finally, text documents are clustered using popular spherical K-means algorithm. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques