CL-NERIL: A Cross-Lingual Model for NER in Indian Languages

Akshara Prabhakar; Gouri Sankar Majumder; Ashish Anand

arXiv:2111.11815·cs.CL·November 24, 2021

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages

Akshara Prabhakar, Gouri Sankar Majumder, Ashish Anand

PDF

Open Access 1 Repo

TL;DR

This paper introduces CL-NERIL, a cross-lingual NER framework for Indian languages that leverages parallel corpora and a teacher-student model to improve performance in low-resource settings.

Contribution

It proposes a novel annotation projection method combined with a teacher-student model to enhance NER in Indian languages using weakly labeled data.

Findings

01

Minimum 10% performance improvement over zero-shot models

02

Effective use of weakly labeled data to supplement source language data

03

Framework applicable to multiple Indian languages

Abstract

Developing Named Entity Recognition (NER) systems for Indian languages has been a long-standing challenge, mainly owing to the requirement of a large amount of annotated clean training instances. This paper proposes an end-to-end framework for NER for Indian languages in a low-resource setting by exploiting parallel corpora of English and Indian languages and an English NER dataset. The proposed framework includes an annotation projection method that combines word alignment score and NER tag prediction confidence score on source language (English) data to generate weakly labeled data in a target Indian language. We employ a variant of the Teacher-Student model and optimize it jointly on the pseudo labels of the Teacher model and predictions on the generated weakly labeled data. We also present manually annotated test sets for three Indian languages: Hindi, Bengali, and Gujarati. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aksh555/cl-neril
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies