Testing the Reliability of ChatGPT for Text Annotation and Classification: A Cautionary Remark
Michael V. Reiss

TL;DR
This study evaluates ChatGPT's reliability in zero-shot text annotation and classification, revealing significant variability in outputs due to prompt variations and repetitions, thus cautioning against unsupervised use without validation.
Contribution
It systematically assesses ChatGPT's consistency for text classification, highlighting the importance of validation and the limitations of its unsupervised application.
Findings
ChatGPT's outputs vary with prompt wording and repetitions.
Pooling multiple outputs can improve reliability.
Unsupervised use of ChatGPT for text annotation is not recommended.
Abstract
Recent studies have demonstrated promising potential of ChatGPT for various text annotation and classification tasks. However, ChatGPT is non-deterministic which means that, as with human coders, identical input can lead to different outputs. Given this, it seems appropriate to test the reliability of ChatGPT. Therefore, this study investigates the consistency of ChatGPT's zero-shot capabilities for text annotation and classification, focusing on different model parameters, prompt variations, and repetitions of identical inputs. Based on the real-world classification task of differentiating website texts into news and not news, results show that consistency in ChatGPT's classification output can fall short of scientific thresholds for reliability. For example, even minor wording alterations in prompts or repeating the identical input can lead to varying outputs. Although pooling outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Explainable Artificial Intelligence (XAI)
MethodsTest
