Multi-head Span-based Detector for AI-generated Fragments in Scientific Papers
German Gritsai, Ildar Khabutdinov, Andrey Grabovoy

TL;DR
This paper introduces a multi-head span-based detector utilizing multi-task learning to identify AI-generated fragments in scientific papers, achieving significant performance improvements over baseline methods.
Contribution
It proposes a novel multi-task learning architecture with two heads for token-level classification of AI-generated text in scientific documents.
Findings
Achieved a 9% improvement in macro F1-score over the baseline.
Reached a 0.96 F1-score on the competition's test set.
Demonstrated effectiveness of multi-head span-based detection in scientific text.
Abstract
This paper describes a system designed to distinguish between AI-generated and human-written scientific excerpts in the DAGPap24 competition hosted within the Fourth Workshop on Scientific Document Processing. In this competition the task is to find artificially generated token-level text fragments in documents of a scientific domain. Our work focuses on the use of a multi-task learning architecture with two heads. The application of this approach is justified by the specificity of the task, where class spans are continuous over several hundred characters. We considered different encoder variations to obtain a state vector for each token in the sequence, as well as a variation in splitting fragments into tokens to further feed into the input of a transform-based encoder. This approach allows us to achieve a 9% quality improvement relative to the baseline solution score on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsScientific Computing and Data Management
MethodsSparse Evolutionary Training
