Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali
Ananda Rimal (Nepal Engineering College), Adarsha Rimal (Tribhuvan University)

TL;DR
This study benchmarks three comparable-sized open-weight LLMs on Romanized Nepali, evaluating zero-shot and fine-tuned performance across multiple metrics, and identifies Qwen3-8B as the most effective architecture after adaptation.
Contribution
It provides the first rigorous baseline for Romanized Nepali adaptation in comparable-sized open-weight LLMs, demonstrating effective fine-tuning with QLoRA and rsLoRA techniques.
Findings
All models fail zero-shot but succeed after fine-tuning.
Qwen3-8B outperforms others post-fine-tuning across metrics.
Llama-3.1-8B shows largest gains in PPL and BERTScore after fine-tuning.
Abstract
Romanized Nepali, the Nepali language written in the Latin alphabet, is the dominant medium for informal digital communication in Nepal, yet it remains critically underresourced in the landscape of Large Language Models (LLMs). This study presents a systematic benchmarking of linguistic adaptation across three comparable-sized open-weight models: Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B. We evaluate these architectures under zero-shot and fine-tuned settings using a curated bilingual dataset of 10,000 transliterated instruction-following samples. Performance is quantified across five metrics spanning seven measurement dimensions: Perplexity (PPL), BERTScore, chrF++, ROUGE-1, ROUGE-2, ROUGE-L, and BLEU, capturing fluency, phonetic consistency, and semantic integrity. Models were fine-tuned using Quantized Low-Rank Adaptation (QLoRA) with Rank-Stabilized LoRA (rsLoRA) at rank r=32 on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
