Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text
Lingyi Yang, Feng Jiang, Haizhou Li

TL;DR
This paper introduces the Polish Ratio, a novel measure to detect ChatGPT involvement in texts, and presents a new dataset for better detection of human-machine collaborative writing, improving robustness over existing methods.
Contribution
The paper proposes the Polish Ratio method and a new dataset, HPPT, to enhance detection of ChatGPT-involved texts, especially in human-machine collaboration scenarios.
Findings
The model outperforms existing detectors on multiple datasets.
The Polish Ratio effectively quantifies ChatGPT influence.
Robust detection in collaborative writing contexts is improved.
Abstract
The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Text Readability and Simplification · Topic Modeling
