TL;DR
This paper investigates how large language models acquire new languages during training, identifying functional specializations and proposing a layer-wise fine-tuning heuristic that improves efficiency and performance.
Contribution
It introduces CogSym, a layer-wise heuristic for effective language adaptation by fine-tuning only specific model layers, reducing computational costs.
Findings
Perceptual and productive specialization emerge in different model regions.
Fine-tuning 25% of outer layers achieves near full fine-tuning performance.
CogSym performs comparably to adapter methods like LoRA.
Abstract
Adapting large language models (LLMs) to new languages is an expensive and opaque process. Understanding how language models acquire new languages and multilingual abilities is key to achieve efficient adaptation. Prior work on multilingual interpretability research focuses primarily on how trained models process multilingual instructions, leaving unexplored the mechanisms through which they acquire new languages during training. We investigate these training dynamics on decoder-only transformers through the lens of two functional cognitive specializations: language perception (input comprehension) and production (output generation). Through experiments on low-resource languages, we demonstrate how perceptual and productive specialization emerges in different regions of a language model by running layer ablation sweeps from the model's input and output directions. Based on the observed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
