Clustering Algorithm for Gujarati Language
Miral Patel, Prem Balani

TL;DR
This paper presents a new clustering algorithm tailored for Gujarati language words, aiming to improve preprocessing for stemming in natural language processing tasks.
Contribution
The paper introduces a novel clustering algorithm specifically designed for Gujarati words, addressing the challenge of unknown cluster numbers in NLP preprocessing.
Findings
Successfully clustered 50,000 Gujarati words
Enhanced preprocessing for stemming in NLP
Demonstrated effectiveness of the proposed algorithm
Abstract
Natural language processing area is still under research. But now a day it is on platform for worldwide researchers. Natural language processing includes analyzing the language based on its structure and then tagging of each word appropriately with its grammar base. Here we have 50,000 tagged words set and we try to cluster those Gujarati words based on proposed algorithm, we have defined our own algorithm for processing. Many clustering techniques are available Ex. Single linkage, complete, linkage,average linkage, Hear no of clusters to be formed are not known, so it is all depends on the type of data set provided . Clustering is preprocess for stemming . Stemming is the process where root is extracted from its word. Ex. cats= cat+S, meaning. Cat: Noun and plural form.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Topic Modeling
