How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks
Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie, Zhou, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR
This paper evaluates GPT-3.5's robustness across diverse NLP tasks and transformations, revealing significant performance drops and specific robustness challenges, which are crucial for trustworthy AI development.
Contribution
It provides a comprehensive analysis of GPT-3.5's robustness using extensive datasets and transformations, highlighting its limitations and areas for improvement.
Findings
GPT-3.5's performance drops up to 43.59% under transformations
Identifies robustness issues like instability, prompt sensitivity, and number sensitivity
Highlights the need for robustness improvements for trustworthy AI
Abstract
The GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks, showcasing their strong understanding and reasoning capabilities. However, their robustness and abilities to handle various complexities of the open world have yet to be explored, which is especially crucial in assessing the stability of models and is a key aspect of trustworthy AI. In this study, we perform a comprehensive experimental analysis of GPT-3.5, exploring its robustness using 21 datasets (about 116K test samples) with 66 text transformations from TextFlint that cover 9 popular Natural Language Understanding (NLU) tasks. Our findings indicate that while GPT-3.5 outperforms existing fine-tuned models on some tasks, it still encounters significant robustness degradation, such as its average performance dropping by up to 35.74\% and 43.59\% in natural language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Test · Linear Layer · Softmax · Attention Dropout · Adam · Cosine Annealing · Linear Warmup With Cosine Annealing
