SynSciPass: detecting appropriate uses of scientific text generation

Domenic Rosati

arXiv:2209.03742·cs.CL·September 13, 2022·6 cites

SynSciPass: detecting appropriate uses of scientific text generation

Domenic Rosati

PDF

Open Access 1 Repo

TL;DR

This paper introduces SynSciPass, a nuanced dataset and framework for detecting machine-generated scientific text, addressing limitations of binary classification and improving robustness across domains.

Contribution

It develops a dataset with technology-specific labels for machine-generated text and demonstrates improved detection robustness and technology identification over existing models.

Findings

01

Model trained on SynSciPass is more robust to domain shifts.

02

Current datasets are insufficient for real-world detection scenarios.

03

Models can identify the type of text generation technology used.

Abstract

Approaches to machine generated text detection tend to focus on binary classification of human versus machine written text. In the scientific domain where publishers might use these models to examine manuscripts under submission, misclassification has the potential to cause harm to authors. Additionally, authors may appropriately use text generation models such as with the use of assistive technologies like translation tools. In this setting, a binary classification scheme might be used to flag appropriate uses of assistive text generation technology as simply machine generated which is a cause of concern. In our work, we simulate this scenario by presenting a state-of-the-art detector trained on the DAGPap22 with machine translated passages from Scielo and find that the model performs at random. Given this finding, we develop a framework for dataset development that provides a nuanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

domenicrosati/synscipass
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling