Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi
Md Nishat Raihan, Dhiman Goswami, Antara Mahmud

TL;DR
This paper introduces Tri-Distil-BERT and Mixed-Distil-BERT, two models pre-trained and fine-tuned on code-mixed Bangla, English, and Hindi data, showing competitive performance in multilingual NLP tasks.
Contribution
The paper presents a novel two-tiered pre-training and fine-tuning approach with smaller models tailored for code-mixed language understanding.
Findings
Mixed-Distil-BERT outperforms some larger models like mBERT and XLM-R on code-mixed NLP tasks.
Pre-training on code-mixed data improves model performance on related downstream tasks.
The models offer efficient alternatives for multilingual and code-mixed language processing.
Abstract
One of the most popular downstream tasks in the field of Natural Language Processing is text classification. Text classification tasks have become more daunting when the texts are code-mixed. Though they are not exposed to such text during pre-training, different BERT models have demonstrated success in tackling Code-Mixed NLP challenges. Again, in order to enhance their performance, Code-Mixed NLP models have depended on combining synthetic data with real-world data. It is crucial to understand how the BERT models' performance is impacted when they are pretrained using corresponding code-mixed languages. In this paper, we introduce Tri-Distil-BERT, a multilingual model pre-trained on Bangla, English, and Hindi, and Mixed-Distil-BERT, a model fine-tuned on code-mixed data. Both models are evaluated across multiple NLP tasks and demonstrate competitive performance against larger models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsAttention Is All You Need · Residual Connection · Adam · Weight Decay · Dropout · Linear Layer · Layer Normalization · WordPiece · Multi-Head Attention · Linear Warmup With Linear Decay
