How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan, Ding, Jianwei Yue, Yupeng Wu

TL;DR
This paper introduces HC3, a large dataset comparing ChatGPT and human responses across various domains, analyzes their differences, and develops detection methods to distinguish AI-generated from human text.
Contribution
It provides the HC3 dataset for evaluating ChatGPT versus human responses and proposes effective detection systems for AI-generated text.
Findings
ChatGPT responses differ significantly from human experts in style and content.
Detection systems can effectively identify AI-generated text with high accuracy.
Linguistic and content analysis reveal key gaps between ChatGPT and human responses.
Abstract
The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Hello-SimpleAI/chatgpt-detector-robertamodel· 26k dl· ♡ 6026k dl♡ 60
- 🤗Hello-SimpleAI/chatgpt-detector-roberta-chinesemodel· 1.2k dl· ♡ 251.2k dl♡ 25
- 🤗Hello-SimpleAI/chatgpt-qa-detector-robertamodel· 768 dl· ♡ 1768 dl♡ 1
- 🤗Hello-SimpleAI/chatgpt-qa-detector-roberta-chinesemodel· 97 dl· ♡ 497 dl♡ 4
- 🤗mrm8488/xlm-roberta-base-finetuned-HC3-mixmodel· 17 dl· ♡ 817 dl♡ 8
- 🤗afroz14/demomodelmodel· 2 dl2 dl
- 🤗Seiriryu/chatgpt-qa-detector-robertamodel· 23 dl· ♡ 123 dl♡ 1
- 🤗etagaca/verifai-detector-robertamodel· 5 dl5 dl
- 🤗devloverumar/chatgpt-content-detectormodel· 32 dl32 dl
- 🤗VSAsteroid/ai-text-detector-hc3model· 49 dl49 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Text Readability and Simplification
