Performance of Transfer Learning Model vs. Traditional Neural Network in   Low System Resource Environment

William Hui

arXiv:2011.07962·cs.CL·November 17, 2020·1 cites

Performance of Transfer Learning Model vs. Traditional Neural Network in Low System Resource Environment

William Hui

PDF

Open Access

TL;DR

This paper compares the performance and resource efficiency of lightweight transfer learning models versus traditional neural networks in low-resource environments for NLP tasks like text classification and NER.

Contribution

It provides an analysis of the trade-offs between transfer learning models and traditional neural networks under limited computational resources.

Findings

01

Lighter transfer learning models require fewer resources than large pre-trained models.

02

Traditional neural networks can be more efficient in extremely low-resource settings.

03

Transfer learning models generally achieve higher accuracy with moderate resource use.

Abstract

Recently, the use of pre-trained model to build neural network based on transfer learning methodology is increasingly popular. These pre-trained models present the benefit of using less computing resources to train model with smaller amount of training data. The rise of state-of-the-art models such as BERT, XLNet and GPT boost accuracy and benefit as a base model for transfer leanring. However, these models are still too complex and consume many computing resource to train for transfer learning with low GPU memory. We will compare the performance and cost between lighter transfer learning model and purposely built neural network for NLP application of text classification and NER model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM

MethodsLinear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Byte Pair Encoding · Dropout · Softmax · Multi-Head Attention · Attention Dropout · Residual Connection