Domain Adaptation for Sparse-Data Settings: What Do We Gain by Not Using Bert?
Marina Sedinkina, Martin Schmitt, Hinrich Sch\"utze

TL;DR
This paper evaluates various NLP methods for small-data scenarios, highlighting that simpler models can be competitive with transfer learning, offering cost-effective options with minimal performance loss.
Contribution
It provides a comprehensive comparison of transfer learning and alternative approaches for NLP in low-data settings, offering practical guidelines.
Findings
Transfer learning with pre-trained models generally outperforms alternatives.
Some simpler models perform nearly as well with much less computational cost.
Training speed can be increased by up to 175,000 times without GPU requirements.
Abstract
The practical success of much of NLP depends on the availability of training data. However, in real-world scenarios, training data is often scarce, not least because many application domains are restricted and specific. In this work, we compare different methods to handle this problem and provide guidelines for building NLP applications when there is only a small amount of labeled training data available for a specific domain. While transfer learning with pre-trained language models outperforms other methods across tasks, alternatives do not perform much worse while requiring much less computational effort, thus significantly reducing monetary and environmental cost. We examine the performance tradeoffs of several such alternatives, including models that can be trained up to 175K times faster and do not require a single GPU.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications
