How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Urja Khurana, Eric Nalisnick, Antske Fokkens

TL;DR
This paper investigates the robustness of ALBERT in sentiment analysis by applying Stochastic Weight Averaging (SWA) and analyzing stability and error consistency across models with different random seeds.
Contribution
It demonstrates that SWA improves ALBERT's error rates and stability, providing insights into model robustness and bias in NLP tasks.
Findings
SWA reduces overall error rates.
Models with different seeds still exhibit distinct biases.
SWA enhances stability but does not eliminate biases.
Abstract
Despite their success, modern language models are fragile. Even small changes in their training pipeline can lead to unexpected results. We study this phenomenon by examining the robustness of ALBERT (arXiv:1909.11942) in combination with Stochastic Weight Averaging (SWA) (arXiv:1803.05407) -- a cheap way of ensembling -- on a sentiment analysis task (SST-2). In particular, we analyze SWA's stability via CheckList criteria (arXiv:2005.04118), examining the agreement on errors made by models differing only in their random seed. We hypothesize that SWA is more stable because it ensembles model snapshots taken along the gradient descent trajectory. We quantify stability by comparing the models' mistakes with Fleiss' Kappa (Fleiss, 1971) and overlap ratio scores. We find that SWA reduces error rates in general; yet the models still suffer from their own distinct biases (according to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dense Connections · Softmax · LAMB · Layer Normalization · Stochastic Weight Averaging
