Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers   through Japanese stylometric analysis

Wataru Zaitsu; Mingzhe Jin

arXiv:2304.05534·cs.CL·June 6, 2023·1 cites

Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis

Wataru Zaitsu, Mingzhe Jin

PDF

Open Access

TL;DR

This study analyzes Japanese stylometric features to distinguish texts generated by GPT-3.5 and GPT-4 from human-written papers, demonstrating high classification accuracy using machine learning techniques.

Contribution

It introduces a novel Japanese stylometric analysis approach and demonstrates effective classification of AI-generated versus human texts with high accuracy.

Findings

01

Distinct stylometric distributions for GPT and human texts identified.

02

Random forest classifier achieved 100% accuracy in distinguishing AI from human texts.

03

GPT-4's increased parameters do not make its texts more human-like in stylometric features.

Abstract

In the first half of 2023, text-generative artificial intelligence (AI), including ChatGPT, equipped with GPT-3.5 and GPT-4, from OpenAI, has attracted considerable attention worldwide. In this study, first, we compared Japanese stylometric features of texts generated by GPT (-3.5 and -4) and those written by humans. In this work, we performed multi-dimensional scaling (MDS) to confirm the distributions of 216 texts of three classes (72 academic papers written by 36 single authors, 72 texts generated by GPT-3.5, and 72 texts generated by GPT-4 on the basis of the titles of the aforementioned papers) focusing on the following stylometric features: (1) bigrams of parts-of-speech, (2) bigram of postpositional particle words, (3) positioning of commas, and (4) rate of function words. MDS revealed distinct distributions at each stylometric feature of GPT (-3.5 and -4) and human. Although…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques

MethodsAttention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Cosine Annealing · Linear Warmup With Cosine Annealing · Dense Connections · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Dropout