Language Identification of Devanagari Poems
Priyankit Acharya, Aditya Ku. Pathak, Rakesh Ch. Balabantaray, and, Anil Ku. Singh

TL;DR
This paper presents a machine learning approach for automatic language identification of Devanagari poems across ten Indian languages, aiding in poem analysis and linguistic research.
Contribution
It introduces a novel procedure for identifying the language of Devanagari poems using supervised and deep learning methods, focusing on lexical similarity among ten languages.
Findings
Supervised machine learning models achieved high accuracy in language identification.
Deep learning techniques outperformed traditional models in classification tasks.
The study provides a comprehensive dataset of Devanagari poems for ten Indian languages.
Abstract
Language Identification is a very important part of several text processing pipelines. Extensive research has been done in this field. This paper proposes a procedure for automatic language identification of poems for poem analysis task, consisting of 10 Devanagari based languages of India i.e. Angika, Awadhi, Braj, Bhojpuri, Chhattisgarhi, Garhwali, Haryanvi, Hindi, Magahi, and Maithili. We collated corpora of poems of varying length and studied the similarity of poems among the 10 languages at the lexical level. Finally, various language identification systems based on supervised machine learning and deep learning techniques are applied and evaluated.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language, Linguistics, Cultural Analysis · Translation Studies and Practices
