Linguistic Classification using Instance-Based Learning
Priya S. Nayak, Rhythm Girdhar, Shreekanth M. Prabhu

TL;DR
This paper introduces an instance-based learning approach to classify words by language, challenging traditional tree-based models and proposing a network perspective for understanding linguistic relationships, especially in Indian languages.
Contribution
The work applies instance-based learning with a custom linguistic distance metric to classify words and explore language relationships beyond traditional tree models, emphasizing network analysis.
Findings
Effective classification of Indian language words
Potential to reveal complex language relationships
Clustering coefficients as quality metrics
Abstract
Traditionally linguists have organized languages of the world as language families modelled as trees. In this work we take a contrarian approach and question the tree-based model that is rather restrictive. For example, the affinity that Sanskrit independently has with languages across Indo-European languages is better illustrated using a network model. We can say the same about inter-relationship between languages in India, where the inter-relationships are better discovered than assumed. To enable such a discovery, in this paper we have made use of instance-based learning techniques to assign language labels to words. We vocalize each word and then classify it by making use of our custom linguistic distance metric of the word relative to training sets containing language labels. We construct the training sets by making use of word clusters and assigning a language and category label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Authorship Attribution and Profiling
