Clustering Algorithm for Gujarati Language

Miral Patel; Prem Balani

arXiv:1307.5393·cs.CL·July 23, 2013·1 cites

Clustering Algorithm for Gujarati Language

Miral Patel, Prem Balani

PDF

Open Access

TL;DR

This paper presents a new clustering algorithm tailored for Gujarati language words, aiming to improve preprocessing for stemming in natural language processing tasks.

Contribution

The paper introduces a novel clustering algorithm specifically designed for Gujarati words, addressing the challenge of unknown cluster numbers in NLP preprocessing.

Findings

01

Successfully clustered 50,000 Gujarati words

02

Enhanced preprocessing for stemming in NLP

03

Demonstrated effectiveness of the proposed algorithm

Abstract

Natural language processing area is still under research. But now a day it is on platform for worldwide researchers. Natural language processing includes analyzing the language based on its structure and then tagging of each word appropriately with its grammar base. Here we have 50,000 tagged words set and we try to cluster those Gujarati words based on proposed algorithm, we have defined our own algorithm for processing. Many clustering techniques are available Ex. Single linkage, complete, linkage,average linkage, Hear no of clusters to be formed are not known, so it is all depends on the type of data set provided . Clustering is preprocess for stemming . Stemming is the process where root is extracted from its word. Ex. cats= cat+S, meaning. Cat: Noun and plural form.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Topic Modeling