On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation
Dan Iter, David Grangier

TL;DR
This paper investigates how data selection techniques can complement fine-tuning in domain adaptation for language models and machine translation, providing practical guidelines for effective implementation.
Contribution
It demonstrates the benefits of combining data selection with fine-tuning, offering new insights and recommendations for optimizing domain adaptation strategies.
Findings
Selected data should be similar but not too similar to the target domain.
Trade-off exists between data quantity and adaptation speed.
Early data selection during pretraining yields comparable gains to extended pretraining.
Abstract
Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning. Data selection improves target domain generalization by training further on pretraining data identified by relying on a small sample of target domain data. This work examines the benefit of data selection for language modeling and machine translation. Our experiments assess the complementarity of selection with fine tuning and result in practical recommendations: (i) selected data must be similar to the fine-tuning domain but not so much as to erode the complementary effect of fine-tuning; (ii) there is a trade-off between selecting little data for fast but limited progress or much data for slow but long lasting progress; (iii) data selection can be applied early during pretraining, with performance gains comparable to long pretraining session; (iv)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
