TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation   and ensemble to recognize complex Named Entities in Bangla

Nazia Tasnim; Md. Istiak Hossain Shihab; Asif Shahriyar Sushmit,; Steven Bethard; Farig Sadeque

arXiv:2204.09964·cs.CL·April 22, 2022

TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla

Nazia Tasnim, Md. Istiak Hossain Shihab, Asif Shahriyar Sushmit,, Steven Bethard, Farig Sadeque

PDF

Open Access

TL;DR

This paper presents a system that uses ensemble of Bangla-specific ELECTRA models and data augmentation techniques to recognize complex, nested, and overlapping Named Entities in Bangla, outperforming traditional methods.

Contribution

It introduces a novel ensemble approach of Bangla-pretrained ELECTRA models combined with data augmentation for complex Named Entity recognition in Bangla.

Findings

01

Ensemble of Bangla ELECTRA models improves recognition accuracy.

02

Data augmentation enhances model robustness.

03

Achieved competitive performance in SemEval 2022 Task 11.

Abstract

Many areas, such as the biological and healthcare domain, artistic works, and organization names, have nested, overlapping, discontinuous entity mentions that may even be syntactically or semantically ambiguous in practice. Traditional sequence tagging algorithms are unable to recognize these complex mentions because they may violate the assumptions upon which sequence tagging schemes are founded. In this paper, we describe our contribution to SemEval 2022 Task 11 on identifying such complex Named Entities. We have leveraged the ensemble of multiple ELECTRA-based models that were exclusively pretrained on the Bangla language with the performance of ELECTRA-based models pretrained on English to achieve competitive performance on the Track-11. Besides providing a system description, we will also present the outcomes of our experiments on architectural decisions, dataset augmentations, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies