TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali &   Marathi

Mohammed Amaan Dhamaskar; Rasika Ransing

arXiv:2502.04245·cs.CL·February 7, 2025

TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi

Mohammed Amaan Dhamaskar, Rasika Ransing

PDF

Open Access

TL;DR

This paper presents TriNER, a multilingual NER model for Hindi, Bengali, and Marathi, achieving high accuracy and reducing inconsistencies across languages by training and fine-tuning transformer models.

Contribution

Introduces a unified NER model for three Indian languages, improving consistency and performance in entity recognition tasks.

Findings

01

F1 Score of 92.11 achieved across languages

02

Reduces inconsistencies in entity tagging

03

Demonstrates effectiveness of transformer-based models

Abstract

India's rich cultural and linguistic diversity poses various challenges in the domain of Natural Language Processing (NLP), particularly in Named Entity Recognition (NER). NER is a NLP task that aims to identify and classify tokens into different entity groups like Person, Location, Organization, Number, etc. This makes NER very useful for downstream tasks like context-aware anonymization. This paper details our work to build a multilingual NER model for the three most spoken languages in India - Hindi, Bengali & Marathi. We train a custom transformer model and fine tune a few pretrained models, achieving an F1 Score of 92.11 for a total of 6 entity groups. Through this paper, we aim to introduce a single model to perform NER and significantly reduce the inconsistencies in entity groups and tag names, across the three languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies