TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla
Nazia Tasnim, Md. Istiak Hossain Shihab, Asif Shahriyar Sushmit,, Steven Bethard, Farig Sadeque

TL;DR
This paper presents a system that uses ensemble of Bangla-specific ELECTRA models and data augmentation techniques to recognize complex, nested, and overlapping Named Entities in Bangla, outperforming traditional methods.
Contribution
It introduces a novel ensemble approach of Bangla-pretrained ELECTRA models combined with data augmentation for complex Named Entity recognition in Bangla.
Findings
Ensemble of Bangla ELECTRA models improves recognition accuracy.
Data augmentation enhances model robustness.
Achieved competitive performance in SemEval 2022 Task 11.
Abstract
Many areas, such as the biological and healthcare domain, artistic works, and organization names, have nested, overlapping, discontinuous entity mentions that may even be syntactically or semantically ambiguous in practice. Traditional sequence tagging algorithms are unable to recognize these complex mentions because they may violate the assumptions upon which sequence tagging schemes are founded. In this paper, we describe our contribution to SemEval 2022 Task 11 on identifying such complex Named Entities. We have leveraged the ensemble of multiple ELECTRA-based models that were exclusively pretrained on the Bangla language with the performance of ELECTRA-based models pretrained on English to achieve competitive performance on the Track-11. Besides providing a system description, we will also present the outcomes of our experiments on architectural decisions, dataset augmentations, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
