Feature Engineering vs BERT on Twitter Data
Ryiaadh Gani, Lisa Chalaguine

TL;DR
This paper compares traditional feature engineering and BERT on Twitter data, finding BERT's advantages are dataset-dependent and often not cost-effective for small improvements.
Contribution
It provides a comparative analysis of feature engineering and BERT on multiple datasets, highlighting when BERT's benefits justify its computational costs.
Findings
BERT outperforms traditional methods on one dataset.
Cost-benefit of BERT is only justified in specific cases.
Small accuracy gains may not justify BERT's resource use.
Abstract
In this paper, we compare the performances of traditional machine learning models using feature engineering and word vectors and the state-of-the-art language model BERT using word embeddings on three datasets. We also consider the time and cost efficiency of feature engineering compared to BERT. From our results we conclude that the use of the BERT model was only worth the time and cost trade-off for one of the three datasets we used for comparison, where the BERT model significantly outperformed any kind of traditional classifier that uses feature vectors, instead of embeddings. Using the BERT model for the other datasets only achieved an increase of 0.03 and 0.05 of accuracy and F1 score respectively, which could be argued makes its use not worth the time and cost of GPU.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Dense Connections · Linear Layer · Layer Normalization · Residual Connection
