Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation   on Nepali

Sharad Duwal; Suraj Prasai; Suresh Manandhar

arXiv:2412.13860·cs.CL·December 19, 2024

Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali

Sharad Duwal, Suraj Prasai, Suresh Manandhar

PDF

Open Access 2 Models 2 Datasets

TL;DR

This paper investigates the effectiveness of domain-adaptive pre-training for low-resource Nepali language tasks by adapting Llama 3 using synthetic data, and evaluates its performance, knowledge retention, and linguistic capabilities.

Contribution

It demonstrates the feasibility of domain-adaptive pre-training in low-resource settings and analyzes its impact on model performance and linguistic knowledge in Nepali.

Findings

01

Final model shows some forgetting but retains significant Nepali knowledge.

02

Increasing evaluation shots improves performance more for the final model.

03

Layer-head self-attention reveals dependency resolution abilities in Nepali.

Abstract

Continual learning has emerged as an important research direction due to the infeasibility of retraining large language models (LLMs) from scratch in the event of new data availability. Of great interest is the domain-adaptive pre-training (DAPT) paradigm, which focuses on continually training a pre-trained language model to adapt it to a domain it was not originally trained on. In this work, we evaluate the feasibility of DAPT in a low-resource setting, namely the Nepali language. We use synthetic data to continue training Llama 3 8B to adapt it to the Nepali language in a 4-bit QLoRA setting. We evaluate the adapted model on its performance, forgetting, and knowledge acquisition. We compare the base model and the final model on their Nepali generation abilities, their performance on popular benchmarks, and run case-studies to probe their linguistic knowledge in Nepali. We see some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsBalanced Selection · LLaMA