ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use   Case of Automatic Genre Identification

Taja Kuzman; Igor Mozeti\v{c}; Nikola Ljube\v{s}i\'c

arXiv:2303.03953·cs.CL·March 9, 2023·66 cites

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

Taja Kuzman, Igor Mozeti\v{c}, Nikola Ljube\v{s}i\'c

PDF

Open Access

TL;DR

This study evaluates ChatGPT's zero-shot capabilities for automatic genre identification across English and Slovenian, showing it can outperform fine-tuned models and potentially reduce manual annotation efforts, despite some language limitations.

Contribution

It demonstrates ChatGPT's effectiveness in zero-shot genre classification, highlighting its potential to replace manual annotation especially for under-resourced languages.

Findings

01

ChatGPT outperforms fine-tuned models on unseen datasets.

02

Performance remains stable across English and Slovenian when prompted in English.

03

Performance drops when Slovenian prompts are used, indicating language limitations.

Abstract

ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used for zero-shot text classification, more specifically, automatic genre identification. We compare ChatGPT with a multilingual XLM-RoBERTa language model that was fine-tuned on datasets, manually annotated with genres. The models are compared on test sets in two languages: English and Slovenian. Results show that ChatGPT outperforms the fine-tuned model when applied to the dataset which was not seen before by either of the models. Even when applied on Slovenian language as an under-resourced language, ChatGPT's performance is no worse than when applied to English. However, if the model is fully prompted in Slovenian, the performance drops significantly, showing the current limitations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Artificial Intelligence in Healthcare and Education

MethodsTest