GenAI Content Detection Task 2: AI vs. Human -- Academic Essay   Authenticity Challenge

Shammur Absar Chowdhury; Hind Almerekhi; Mucahid Kutlu; Kaan Efe; Keles; Fatema Ahmad; Tasnim Mohiuddin; George Mikros; Firoj Alam

arXiv:2412.18274·cs.CL·December 25, 2024

GenAI Content Detection Task 2: AI vs. Human -- Academic Essay Authenticity Challenge

Shammur Absar Chowdhury, Hind Almerekhi, Mucahid Kutlu, Kaan Efe, Keles, Fatema Ahmad, Tasnim Mohiuddin, George Mikros, Firoj Alam

PDF

Open Access

TL;DR

This paper reviews the first Academic Essay Authenticity Challenge, highlighting advances in AI vs. human essay detection with high accuracy, driven by transformer models and LLMs across English and Arabic.

Contribution

It introduces a new benchmark dataset and evaluation framework for AI-generated essay detection, showcasing state-of-the-art results and diverse approaches from multiple teams.

Findings

01

Top systems achieved F1 scores over 0.98

02

Transformer-based models significantly outperformed baselines

03

Both English and Arabic detection tasks showed high accuracy

Abstract

This paper presents a comprehensive overview of the first edition of the Academic Essay Authenticity Challenge, organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025. This challenge focuses on detecting machine-generated vs. human-authored essays for academic purposes. The task is defined as follows: "Given an essay, identify whether it is generated by a machine or authored by a human.'' The challenge involves two languages: English and Arabic. During the evaluation phase, 25 teams submitted systems for English and 21 teams for Arabic, reflecting substantial interest in the task. Finally, seven teams submitted system description papers. The majority of submissions utilized fine-tuned transformer-based models, with one team employing Large Language Models (LLMs) such as Llama 2 and Llama 3. This paper outlines the task formulation, details the dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsLLaMA