Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Carlos Aspillaga, Andr\'es Carvallo, Vladimir Araujo

TL;DR
This study systematically evaluates the robustness of Transformer-based models like RoBERTa, XLNet, and BERT under stress tests in NLI and QA tasks, revealing they are more robust than older models but still fragile and prone to unexpected behaviors.
Contribution
First comprehensive stress test evaluation of Transformer models in NLP tasks, highlighting their robustness and remaining vulnerabilities.
Findings
Transformer models are more robust than RNNs under stress.
They still exhibit significant fragility and unexpected behaviors.
Room for future improvements in model robustness is evident.
Abstract
There has been significant progress in recent years in the field of Natural Language Processing thanks to the introduction of the Transformer architecture. Current state-of-the-art models, via a large number of parameters and pre-training on massive text corpus, have shown impressive results on several downstream tasks. Many researchers have studied previous (non-Transformer) models to understand their actual behavior under different scenarios, showing that these models are taking advantage of clues or failures of datasets and that slight perturbations on the input data can severely reduce their performance. In contrast, recent models have not been systematically tested with adversarial-examples in order to show their robustness under severe stress conditions. For that reason, this work evaluates three Transformer-based models (RoBERTa, XLNet, and BERT) in Natural Language Inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · RoBERTa · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · SentencePiece · Byte Pair Encoding
