Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis
Wataru Zaitsu, Mingzhe Jin

TL;DR
This study analyzes Japanese stylometric features to distinguish texts generated by GPT-3.5 and GPT-4 from human-written papers, demonstrating high classification accuracy using machine learning techniques.
Contribution
It introduces a novel Japanese stylometric analysis approach and demonstrates effective classification of AI-generated versus human texts with high accuracy.
Findings
Distinct stylometric distributions for GPT and human texts identified.
Random forest classifier achieved 100% accuracy in distinguishing AI from human texts.
GPT-4's increased parameters do not make its texts more human-like in stylometric features.
Abstract
In the first half of 2023, text-generative artificial intelligence (AI), including ChatGPT, equipped with GPT-3.5 and GPT-4, from OpenAI, has attracted considerable attention worldwide. In this study, first, we compared Japanese stylometric features of texts generated by GPT (-3.5 and -4) and those written by humans. In this work, we performed multi-dimensional scaling (MDS) to confirm the distributions of 216 texts of three classes (72 academic papers written by 36 single authors, 72 texts generated by GPT-3.5, and 72 texts generated by GPT-4 on the basis of the titles of the aforementioned papers) focusing on the following stylometric features: (1) bigrams of parts-of-speech, (2) bigram of postpositional particle words, (3) positioning of commas, and (4) rate of function words. MDS revealed distinct distributions at each stylometric feature of GPT (-3.5 and -4) and human. Although…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
MethodsAttention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Cosine Annealing · Linear Warmup With Cosine Annealing · Dense Connections · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Dropout
