Bangla Word Clustering Based on Tri-gram, 4-gram and 5-gram Language Model
Dipaloke Saha, Md Saddam Hossain, MD. Saiful Islam, Sabir Ismail

TL;DR
This paper explores Bangla word clustering using tri-gram, 4-gram, and 5-gram language models to improve NLP tasks, comparing their effectiveness with a large corpus and machine learning techniques.
Contribution
It introduces the application of n-gram based word clustering specifically for Bangla, addressing resource limitations and evaluating different n-gram models.
Findings
Preliminary implementation of Bangla word clustering using n-gram models.
Comparison of tri-gram, 4-gram, and 5-gram models to identify the most effective.
Use of machine learning techniques on a large Bangla corpus for clustering.
Abstract
In this paper, we describe a research method that generates Bangla word clusters on the basis of relating to meaning in language and contextual similarity. The importance of word clustering is in parts of speech (POS) tagging, word sense disambiguation, text classification, recommender system, spell checker, grammar checker, knowledge discover and for many others Natural Language Processing (NLP) applications. In the history of word clustering, English and some other languages have already implemented some methods on word clustering efficiently. But due to lack of the resources, word clustering in Bangla has not been still implemented efficiently. Presently, its implementation is in the beginning stage. In some research of word clustering in English based on preceding and next five words of a key word they found an efficient result. Now, we are trying to implement the tri-gram, 4-gram…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
