How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?
Ishani Mondal, Abhilasha Sancheti

TL;DR
This paper systematically evaluates ChatGPT's robustness in Named Entity Recognition under input perturbations, revealing its brittleness, overconfidence, and the impact of perturbations on explanation quality, with implications for its reliability in real-world applications.
Contribution
It provides a comprehensive analysis of ChatGPT's robustness in NER tasks under various input perturbations, highlighting its limitations and the effects on prediction confidence and explanation quality.
Findings
ChatGPT is more brittle on rare entity perturbations like drugs or diseases.
Explanation quality varies significantly with different perturbations.
ChatGPT tends to be overconfident in incorrect predictions.
Abstract
In this paper, we assess the robustness (reliability) of ChatGPT under input perturbations for one of the most fundamental tasks of Information Extraction (IE) i.e. Named Entity Recognition (NER). Despite the hype, the majority of the researchers have vouched for its language understanding and generation capabilities; a little attention has been paid to understand its robustness: How the input-perturbations affect 1) the predictions, 2) the confidence of predictions and 3) the quality of rationale behind its prediction. We perform a systematic analysis of ChatGPT's robustness (under both zero-shot and few-shot setup) on two NER datasets using both automatic and human evaluation. Based on automatic evaluation metrics, we find that 1) ChatGPT is more brittle on Drug or Disease replacements (rare entities) compared to the perturbations on widely known Person or Location entities, 2) the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
