On Language Clustering: A Non-parametric Statistical Approach
Anagh Chattopadhyay, Soumya Sankar Ghosh, Samir Karmakar

TL;DR
This paper introduces a nonparametric statistical framework for language clustering, utilizing data depth and multidimensional analysis to improve language classification and outlier detection without relying on distributional assumptions.
Contribution
It presents a novel nonparametric approach to language family structuring using data depth and multidimensional scaling, enhancing robustness and re-evaluation of language classifications.
Findings
Effective outlier detection in language data
Improved language clustering accuracy
Re-evaluation of existing language classifications
Abstract
Any approach aimed at pasteurizing and quantifying a particular phenomenon must include the use of robust statistical methodologies for data analysis. With this in mind, the purpose of this study is to present statistical approaches that may be employed in nonparametric nonhomogeneous data frameworks, as well as to examine their application in the field of natural language processing and language clustering. Furthermore, this paper discusses the many uses of nonparametric approaches in linguistic data mining and processing. The data depth idea allows for the centre-outward ordering of points in any dimension, resulting in a new nonparametric multivariate statistical analysis that does not require any distributional assumptions. The concept of hierarchy is used in historical language categorisation and structuring, and it aims to organise and cluster languages into subfamilies using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Bayesian Methods and Mixture Models · Text and Document Classification Technologies
