A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models
Firoj Alam, Arid Hasan, Tanvirul Alam, Akib Khan, Janntatul Tajrin,, Naira Khan, Shammur Absar Chowdhury

TL;DR
This paper surveys Bangla NLP tasks, resources, and recent transformer-based model advances, benchmarking datasets and models to highlight progress and challenges in this low-resource language.
Contribution
It provides the first comprehensive review and benchmarking of Bangla NLP tasks using state-of-the-art transformer models, including resource analysis and experimental results.
Findings
Transformer models show promising performance on Bangla NLP tasks.
Multilingual models perform comparably to monolingual models for certain tasks.
Computational costs increase with model size and complexity.
Abstract
Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the scarcity of resources and the challenges that come with it. There is sparse work in different areas of BNLP; however, a thorough survey reporting previous work and recent advances is yet to be done. In this study, we first provide a review of Bangla NLP tasks, resources, and tools available to the research community; we benchmark datasets collected from various platforms for nine NLP tasks using current state-of-the-art algorithms (i.e., transformer-based models). We provide comparative results for the studied NLP tasks by comparing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · WordPiece · Layer Normalization · Adam · Dropout · Attention Dropout
