Fine-tuning Pre-trained Named Entity Recognition Models For Indian   Languages

Sankalp Bahad; Pruthwik Mishra; Karunesh Arora; Rakesh Chandra; Balabantaray; Dipti Misra Sharma; Parameswari Krishnamurthy

arXiv:2405.04829·cs.CL·May 13, 2024

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra, Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy

PDF

Open Access

TL;DR

This paper introduces a new dataset and a fine-tuned multilingual NER model for Indian languages, addressing the lack of resources and challenges in processing these languages in NLP applications.

Contribution

It provides a human-annotated corpus of 40K sentences for four Indian languages and demonstrates a fine-tuned model achieving high F1 scores, improving NER for Indian languages.

Findings

01

Achieved an average F1 score of 0.80 on the dataset.

02

Model performs well on unseen Indian language datasets.

03

Addresses challenges specific to Indian languages in NER.

Abstract

Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Languages. We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families. Additionally,we present a multilingual model fine-tuned on our dataset, which achieves an F1 score of 0.80 on our dataset on average. We achieve comparable performance on completely unseen benchmark datasets for Indian languages which affirms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies