Multi-head Span-based Detector for AI-generated Fragments in Scientific   Papers

German Gritsai; Ildar Khabutdinov; Andrey Grabovoy

arXiv:2411.07343·cs.CL·November 19, 2024

Multi-head Span-based Detector for AI-generated Fragments in Scientific Papers

German Gritsai, Ildar Khabutdinov, Andrey Grabovoy

PDF

Open Access 1 Video

TL;DR

This paper introduces a multi-head span-based detector utilizing multi-task learning to identify AI-generated fragments in scientific papers, achieving significant performance improvements over baseline methods.

Contribution

It proposes a novel multi-task learning architecture with two heads for token-level classification of AI-generated text in scientific documents.

Findings

01

Achieved a 9% improvement in macro F1-score over the baseline.

02

Reached a 0.96 F1-score on the competition's test set.

03

Demonstrated effectiveness of multi-head span-based detection in scientific text.

Abstract

This paper describes a system designed to distinguish between AI-generated and human-written scientific excerpts in the DAGPap24 competition hosted within the Fourth Workshop on Scientific Document Processing. In this competition the task is to find artificially generated token-level text fragments in documents of a scientific domain. Our work focuses on the use of a multi-task learning architecture with two heads. The application of this approach is justified by the specificity of the task, where class spans are continuous over several hundred characters. We considered different encoder variations to obtain a state vector for each token in the sequence, as well as a variation in splitting fragments into tokens to further feed into the input of a transform-based encoder. This approach allows us to achieve a 9% quality improvement relative to the baseline solution score on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-head Span-based Detector for AI-generated Fragments in Scientific Papers· underline

Taxonomy

TopicsScientific Computing and Data Management

MethodsSparse Evolutionary Training