Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and   Hindi

Md Nishat Raihan; Dhiman Goswami; Antara Mahmud

arXiv:2309.10272·cs.CL·March 15, 2024

Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi

Md Nishat Raihan, Dhiman Goswami, Antara Mahmud

PDF

Open Access 2 Models 2 Datasets

TL;DR

This paper introduces Tri-Distil-BERT and Mixed-Distil-BERT, two models pre-trained and fine-tuned on code-mixed Bangla, English, and Hindi data, showing competitive performance in multilingual NLP tasks.

Contribution

The paper presents a novel two-tiered pre-training and fine-tuning approach with smaller models tailored for code-mixed language understanding.

Findings

01

Mixed-Distil-BERT outperforms some larger models like mBERT and XLM-R on code-mixed NLP tasks.

02

Pre-training on code-mixed data improves model performance on related downstream tasks.

03

The models offer efficient alternatives for multilingual and code-mixed language processing.

Abstract

One of the most popular downstream tasks in the field of Natural Language Processing is text classification. Text classification tasks have become more daunting when the texts are code-mixed. Though they are not exposed to such text during pre-training, different BERT models have demonstrated success in tackling Code-Mixed NLP challenges. Again, in order to enhance their performance, Code-Mixed NLP models have depended on combining synthetic data with real-world data. It is crucial to understand how the BERT models' performance is impacted when they are pretrained using corresponding code-mixed languages. In this paper, we introduce Tri-Distil-BERT, a multilingual model pre-trained on Bangla, English, and Hindi, and Mixed-Distil-BERT, a model fine-tuned on code-mixed data. Both models are evaluated across multiple NLP tasks and demonstrate competitive performance against larger models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsAttention Is All You Need · Residual Connection · Adam · Weight Decay · Dropout · Linear Layer · Layer Normalization · WordPiece · Multi-Head Attention · Linear Warmup With Linear Decay