A Comparison of LSTM and BERT for Small Corpus

Aysu Ezen-Can

arXiv:2009.05451·cs.CL·September 14, 2020·67 cites

A Comparison of LSTM and BERT for Small Corpus

Aysu Ezen-Can

PDF

Open Access

TL;DR

This paper compares LSTM and BERT models on small datasets for intent classification, finding that simpler LSTM models outperform BERT in accuracy and training time, emphasizing task and data considerations in model selection.

Contribution

It provides empirical evidence that traditional LSTM models can outperform BERT on small datasets, challenging the assumption that larger pre-trained models are always superior in such scenarios.

Findings

01

LSTM outperforms BERT on small intent classification datasets.

02

LSTM models train faster than BERT in small data settings.

03

Model choice should consider task and data characteristics.

Abstract

Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch. Transformers have made a significant improvement in creating new state-of-the-art results for many NLP tasks including but not limited to text classification, text generation, and sequence labeling. Most of these success stories were based on large datasets. In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models? To answer this question, we use a small dataset for intent classification collected for building chatbots and compare the performance of a simple bidirectional LSTM model with a pre-trained BERT model. Our experimental results show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Softmax · Sigmoid Activation · Layer Normalization · Tanh Activation · Long Short-Term Memory · Weight Decay · Dropout · Linear Warmup With Linear Decay · Dense Connections