Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI
Mohamad Ballout, Ulf Krumnack, Gunther Heidemann, Kai-Uwe, K\"uhnberger

TL;DR
Pre-trained language models demonstrate strong cross-domain generalization capabilities, outperforming models trained from scratch on various tasks, and their effectiveness is maintained even with reduced parameters, advancing progress toward general AI.
Contribution
This study systematically evaluates pre-trained language models on diverse non-language tasks, highlighting their generalization ability and the importance of pre-trained embeddings.
Findings
Pre-trained models outperform from-scratch transformers by large margins.
Reducing parameters in pre-trained models has minimal performance impact.
Pre-trained embeddings are essential for optimal results.
Abstract
Pre-trained language models have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we investigate the ability of pre-trained language models to generalize to different non-language tasks. In particular, we test them on tasks from different domains such as computer vision, reasoning on hierarchical data, and protein fold prediction. The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results. They all have similar performance and they outperform transformers that are trained from scratch by a large margin. For instance, pre-trained language models perform better on the Listops dataset, with an average accuracy of 58.7\%, compared to transformers trained from scratch, which have an average accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing · Adafactor · Discriminative Fine-Tuning · SentencePiece · Layer Normalization
