How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language   Understanding Tasks

Xuanting Chen; Junjie Ye; Can Zu; Nuo Xu; Rui Zheng; Minlong Peng; Jie; Zhou; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2303.00293·cs.CL·March 2, 2023·35 cites

How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie, Zhou, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

Open Access

TL;DR

This paper evaluates GPT-3.5's robustness across diverse NLP tasks and transformations, revealing significant performance drops and specific robustness challenges, which are crucial for trustworthy AI development.

Contribution

It provides a comprehensive analysis of GPT-3.5's robustness using extensive datasets and transformations, highlighting its limitations and areas for improvement.

Findings

01

GPT-3.5's performance drops up to 43.59% under transformations

02

Identifies robustness issues like instability, prompt sensitivity, and number sensitivity

03

Highlights the need for robustness improvements for trustworthy AI

Abstract

The GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks, showcasing their strong understanding and reasoning capabilities. However, their robustness and abilities to handle various complexities of the open world have yet to be explored, which is especially crucial in assessing the stability of models and is a key aspect of trustworthy AI. In this study, we perform a comprehensive experimental analysis of GPT-3.5, exploring its robustness using 21 datasets (about 116K test samples) with 66 text transformations from TextFlint that cover 9 popular Natural Language Understanding (NLU) tasks. Our findings indicate that while GPT-3.5 outperforms existing fine-tuned models on some tasks, it still encounters significant robustness degradation, such as its average performance dropping by up to 35.74\% and 43.59\% in natural language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Test · Linear Layer · Softmax · Attention Dropout · Adam · Cosine Annealing · Linear Warmup With Cosine Annealing