Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali
Sharad Duwal, Suraj Prasai, Suresh Manandhar

TL;DR
This paper investigates the effectiveness of domain-adaptive pre-training for low-resource Nepali language tasks by adapting Llama 3 using synthetic data, and evaluates its performance, knowledge retention, and linguistic capabilities.
Contribution
It demonstrates the feasibility of domain-adaptive pre-training in low-resource settings and analyzes its impact on model performance and linguistic knowledge in Nepali.
Findings
Final model shows some forgetting but retains significant Nepali knowledge.
Increasing evaluation shots improves performance more for the final model.
Layer-head self-attention reveals dependency resolution abilities in Nepali.
Abstract
Continual learning has emerged as an important research direction due to the infeasibility of retraining large language models (LLMs) from scratch in the event of new data availability. Of great interest is the domain-adaptive pre-training (DAPT) paradigm, which focuses on continually training a pre-trained language model to adapt it to a domain it was not originally trained on. In this work, we evaluate the feasibility of DAPT in a low-resource setting, namely the Nepali language. We use synthetic data to continue training Llama 3 8B to adapt it to the Nepali language in a 4-bit QLoRA setting. We evaluate the adapted model on its performance, forgetting, and knowledge acquisition. We compare the base model and the final model on their Nepali generation abilities, their performance on popular benchmarks, and run case-studies to probe their linguistic knowledge in Nepali. We see some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsBalanced Selection · LLaMA
