Adapting Multilingual Models to Code-Mixed Tasks via Model Merging

Prashant Kodali; Vaishnavi Shivkumar; Swarang Joshi; Monojit Choudhary; Ponnurangam Kumaraguru; Manish Shrivastava

arXiv:2510.19782·cs.CL·October 24, 2025

Adapting Multilingual Models to Code-Mixed Tasks via Model Merging

Prashant Kodali, Vaishnavi Shivkumar, Swarang Joshi, Monojit Choudhary, Ponnurangam Kumaraguru, Manish Shrivastava

PDF

Open Access

TL;DR

This paper introduces a model merging approach for adapting multilingual models to code-mixed NLP tasks, showing improved performance over traditional fine-tuning and CPT methods, especially in low-resource settings.

Contribution

It proposes a novel model merging technique for code-mixed NLP adaptation, demonstrating superior results and transferability compared to existing methods.

Findings

01

Merged models outperform full fine-tuning and CPT->FT in F1 scores.

02

Unlabeled data is leveraged more effectively via merging.

03

Merged checkpoints transfer better across language pairs.

Abstract

We study model merging as a practical alternative to conventional adaptation strategies for code-mixed NLP. Starting from a multilingual base model, we: (i) perform continued pre-training (CPT) on unlabeled code-mixed text to obtain an adapted checkpoint, (ii) merge checkpoint with the base model, and (iii) fine-tune (FT) on the downstream task data. We evaluate our approach for sentence classification (sentiment and hate speech) task in English-Hindi (En-Hi) and English-Spanish (En-Es) using XLM-R and Llama-3.2-1B models. Our results show that merged models consistently outperform full fine-tuning and CPT->FT. We observe gains of 2--5 points in F1 over full fine-tuning and ~1-2 points over CPT->FT, indicating that unlabeled data is leveraged more effectively via merging than via CPT alone. Zero-/few-shot prompting with larger LLMs (e.g., Llama-3.3-70B) lags behind fine-tuned and merged…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection