TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain   Machine Generated Text Detection Techniques

Ashok Urlana; Aditya Saibewar; Bala Mallikarjunarao Garlapati; Charaka; Vinayak Kumar; Ajeet Kumar Singh; Srinivasa Rao Chalamala

arXiv:2403.16592·cs.CL·March 26, 2024·1 cites

TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques

Ashok Urlana, Aditya Saibewar, Bala Mallikarjunarao Garlapati, Charaka, Vinayak Kumar, Ajeet Kumar Singh, Srinivasa Rao Chalamala

PDF

Open Access

TL;DR

This paper analyzes multiple techniques for detecting machine-generated text across different domains and languages, evaluating their effectiveness and highlighting future challenges in the field.

Contribution

It provides a comprehensive analysis of statistical, neural, and pre-trained model approaches for machine-generated text detection in a multilingual, multi-domain setting.

Findings

01

Achieved 86.9% accuracy on mono-lingual detection

02

Achieved 83.7% accuracy on multi-lingual detection

03

Identified key challenges and factors for future research

Abstract

The Large Language Models (LLMs) exhibit remarkable ability to generate fluent content across a wide spectrum of user queries. However, this capability has raised concerns regarding misinformation and personal information leakage. In this paper, we present our methods for the SemEval2024 Task8, aiming to detect machine-generated text across various domains in both mono-lingual and multi-lingual contexts. Our study comprehensively analyzes various methods to detect machine-generated text, including statistical, neural, and pre-trained model approaches. We also detail our experimental setup and perform a in-depth error analysis to evaluate the effectiveness of these methods. Our methods obtain an accuracy of 86.9\% on the test set of subtask-A mono and 83.7\% for subtask-B. Furthermore, we also highlight the challenges and essential factors for consideration in future studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training