DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for   Detecting Boundaries of Machine-Generated Texts

Anastasia Voznyuk; Vasily Konovalov

arXiv:2405.10629·cs.CL·May 20, 2024

DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts

Anastasia Voznyuk, Vasily Konovalov

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a transfer learning approach using DeBERTaV3 to detect boundaries in machine-generated texts, achieving state-of-the-art results in the SemEval-2024 boundary detection task.

Contribution

It presents a novel data augmentation pipeline for fine-tuning DeBERTaV3 specifically for boundary detection in AI-generated texts.

Findings

01

Achieved the best MAE score in the competition

02

Demonstrated effectiveness of data augmentation for boundary detection

03

Enhanced detection accuracy over existing methods

Abstract

The Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection shared task in the SemEval-2024 competition aims to tackle the problem of misusing collaborative human-AI writing. Although there are a lot of existing detectors of AI content, they are often designed to give a binary answer and thus may not be suitable for more nuanced problem of finding the boundaries between human-written and machine-generated texts, while hybrid human-AI writing becomes more and more popular. In this paper, we address the boundary detection problem. Particularly, we present a pipeline for augmenting data for supervised fine-tuning of DeBERTaV3. We receive new best MAE score, according to the leaderboard of the competition, with this pipeline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

natriistorm/semeval2024-boundary-detection
noneOfficial

Videos

DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsMasked autoencoder