Adapting Large Language Models to Low-Resource Tibetan: A Two-Stage Continual and Supervised Fine-Tuning Study
Lifeng Chen, Ryan Lai, Tianming Liu

TL;DR
This paper presents a two-stage fine-tuning approach for adapting large language models to Tibetan, significantly improving translation quality and understanding of adaptation dynamics in low-resource language settings.
Contribution
It introduces a novel two-stage adaptation process combining Continual Pretraining and Supervised Fine-Tuning for Tibetan, with detailed analysis of model layer adaptations.
Findings
Perplexity decreased from 2.98 to 1.54
Translation BLEU score improved from 0.046 to 0.261
Layer-wise analysis reveals adaptation mainly in embedding and output layers
Abstract
Adapting large language models (LLMs) to low-resource languages remains a major challenge due to data scarcity and cross-lingual drift. This work presents a two-stage adaptation of Qwen2.5-3B to Tibetan, a morphologically rich and underrepresented language. We employ Continual Pretraining (CPT) to establish Tibetan linguistic grounding, followed by Supervised Fine-Tuning (SFT) for task and translation specialization. Empirical evaluations demonstrate a consistent decrease in perplexity (from 2.98 1.54) and substantial improvements in ChineseTibetan translation quality (BLEU: 0.046 0.261; chrF: 2.2 6.6). Layer-wise analysis across 435 layers in Qwen3-4B reveals that adaptation primarily concentrates on embedding and output heads, with mid--late MLP projections encoding domain-specific transformations. Our findings suggest that CPT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods
