MUGC: Machine Generated versus User Generated Content Detection

Yaqi Xie; Anjali Rawal; Yujing Cen; Dixuan Zhao; Sunil K Narang; Shanu; Sushmita

arXiv:2403.19725·cs.CL·April 1, 2024·1 cites

MUGC: Machine Generated versus User Generated Content Detection

Yaqi Xie, Anjali Rawal, Yujing Cen, Dixuan Zhao, Sunil K Narang, Shanu, Sushmita

PDF

Open Access

TL;DR

This paper evaluates traditional machine learning methods for distinguishing machine-generated content from human-generated content across diverse datasets, highlighting the effectiveness of semantic and stylistic features in detection.

Contribution

It offers a comparative analysis of eight traditional algorithms and explores semantic and stylistic features for improved detection of machine-generated text.

Findings

01

High accuracy of traditional methods in identifying machine-generated data

02

Shorter length and less word variety in machine-generated texts

03

Deeper semantic features like word2vec improve detection performance

Abstract

As advanced modern systems like deep neural networks (DNNs) and generative AI continue to enhance their capabilities in producing convincing and realistic content, the need to distinguish between user-generated and machine generated content is becoming increasingly evident. In this research, we undertake a comparative evaluation of eight traditional machine-learning algorithms to distinguish between machine-generated and human-generated data across three diverse datasets: Poems, Abstracts, and Essays. Our results indicate that traditional methods demonstrate a high level of accuracy in identifying machine-generated data, reflecting the documented effectiveness of popular pre-trained models like RoBERT. We note that machine-generated texts tend to be shorter and exhibit less word variety compared to human-generated content. While specific domain-related keywords commonly utilized by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis