Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms:   Exploring Tuning Strategies

Boshko Koloski; Bla\v{z} \v{S}krlj; Marko Robnik-\v{S}ikonja; Senja; Pollak

arXiv:2309.06089·cs.CL·February 18, 2025

Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies

Boshko Koloski, Bla\v{z} \v{S}krlj, Marko Robnik-\v{S}ikonja, Senja, Pollak

PDF

Open Access

TL;DR

This study compares fine-tuning strategies and transfer methods in cross-lingual models, analyzing their impact on catastrophic forgetting across multiple languages in classification tasks.

Contribution

It provides an empirical comparison of parameter-efficient adapters versus full fine-tuning and evaluates intermediate-training versus cross-lingual validation strategies.

Findings

01

IT outperforms CLV for target language transfer.

02

CLV better retains source language knowledge.

03

Results vary across tasks and languages.

Abstract

The cross-lingual transfer is a promising technique to solve tasks in less-resourced languages. In this empirical study, we compare two fine-tuning approaches combined with zero-shot and full-shot learning approaches for large language models in a cross-lingual setting. As fine-tuning strategies, we compare parameter-efficient adapter methods with fine-tuning of all parameters. As cross-lingual transfer strategies, we compare the intermediate-training (\textit{IT}) that uses each language sequentially and cross-lingual validation (\textit{CLV}) that uses a target language already in the validation phase of fine-tuning. We assess the success of transfer and the extent of catastrophic forgetting in a source language due to cross-lingual transfer, i.e., how much previously acquired knowledge is lost when we learn new information in a different language. The results on two different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Speech Recognition and Synthesis

MethodsAdapter · Balanced Selection