TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi
Mohammed Amaan Dhamaskar, Rasika Ransing

TL;DR
This paper presents TriNER, a multilingual NER model for Hindi, Bengali, and Marathi, achieving high accuracy and reducing inconsistencies across languages by training and fine-tuning transformer models.
Contribution
Introduces a unified NER model for three Indian languages, improving consistency and performance in entity recognition tasks.
Findings
F1 Score of 92.11 achieved across languages
Reduces inconsistencies in entity tagging
Demonstrates effectiveness of transformer-based models
Abstract
India's rich cultural and linguistic diversity poses various challenges in the domain of Natural Language Processing (NLP), particularly in Named Entity Recognition (NER). NER is a NLP task that aims to identify and classify tokens into different entity groups like Person, Location, Organization, Number, etc. This makes NER very useful for downstream tasks like context-aware anonymization. This paper details our work to build a multilingual NER model for the three most spoken languages in India - Hindi, Bengali & Marathi. We train a custom transformer model and fine tune a few pretrained models, achieving an F1 Score of 92.11 for a total of 6 entity groups. Through this paper, we aim to introduce a single model to perform NER and significantly reduce the inconsistencies in entity groups and tag names, across the three languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
