Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect   ChatGPT-Generated Text

Lingyi Yang; Feng Jiang; Haizhou Li

arXiv:2307.11380·cs.CL·January 2, 2024·1 cites

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Lingyi Yang, Feng Jiang, Haizhou Li

PDF

Open Access 2 Repos

TL;DR

This paper introduces the Polish Ratio, a novel measure to detect ChatGPT involvement in texts, and presents a new dataset for better detection of human-machine collaborative writing, improving robustness over existing methods.

Contribution

The paper proposes the Polish Ratio method and a new dataset, HPPT, to enhance detection of ChatGPT-involved texts, especially in human-machine collaboration scenarios.

Findings

01

The model outperforms existing detectors on multiple datasets.

02

The Polish Ratio effectively quantifies ChatGPT influence.

03

Robust detection in collaborative writing contexts is improved.

Abstract

The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Text Readability and Simplification · Topic Modeling