Amharic Text Clustering Using Encyclopedic Knowledge with Neural Word Embedding
Dessalew Yohannes, Yeregal Assabie

TL;DR
This paper presents a novel approach for clustering Amharic text documents by integrating Encyclopedic Knowledge with neural word embeddings, enhancing clustering accuracy over traditional methods.
Contribution
The study introduces a combined EK and neural embedding method for Amharic text clustering, demonstrating improved accuracy and analyzing the impact of class size on results.
Findings
EK with word embedding improves clustering accuracy
Changing class size significantly affects accuracy
System tested on Amharic Wikipedia data
Abstract
In this digital era, almost in every discipline people are using automated systems that generate information represented in document format in different natural languages. As a result, there is a growing interest towards better solutions for finding, organizing and analyzing these documents. In this paper, we propose a system that clusters Amharic text documents using Encyclopedic Knowledge (EK) with neural word embedding. EK enables the representation of related concepts and neural word embedding allows us to handle the contexts of the relatedness. During the clustering process, all the text documents pass through preprocessing stages. Enriched text document features are extracted from each document by mapping with EK and word embedding model. TF-IDF weighted vector of enriched feature was generated. Finally, text documents are clustered using popular spherical K-means algorithm. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques
