CMNEROne at SemEval-2022 Task 11: Code-Mixed Named Entity Recognition by   leveraging multilingual data

Suman Dowlagar; Radhika Mamidi

arXiv:2206.07318·cs.CL·June 16, 2022

CMNEROne at SemEval-2022 Task 11: Code-Mixed Named Entity Recognition by leveraging multilingual data

Suman Dowlagar, Radhika Mamidi

PDF

Open Access

TL;DR

This paper presents a multilingual approach to code-mixed Named Entity Recognition, achieving significant improvements in F1 score for the SEMEVAL 2022 shared task by leveraging diverse language data.

Contribution

The work introduces a novel multilingual data leveraging technique for code-mixed NER, enhancing performance over baseline models.

Findings

01

Achieved a weighted F1 score of 0.7044, surpassing the baseline by 6%.

02

Demonstrated the effectiveness of multilingual data in code-mixed NER.

03

Improved the state-of-the-art performance on the MultiCoNER dataset.

Abstract

Identifying named entities is, in general, a practical and challenging task in the field of Natural Language Processing. Named Entity Recognition on the code-mixed text is further challenging due to the linguistic complexity resulting from the nature of the mixing. This paper addresses the submission of team CMNEROne to the SEMEVAL 2022 shared task 11 MultiCoNER. The Code-mixed NER task aimed to identify named entities on the code-mixed dataset. Our work consists of Named Entity Recognition (NER) on the code-mixed dataset by leveraging the multilingual data. We achieved a weighted average F1 score of 0.7044, i.e., 6% greater than the baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification