Is writing style predictive of scientific fraud?

Chlo\'e Braud; Anders S{\o}gaard

arXiv:1707.04095·cs.CL·July 14, 2017

Is writing style predictive of scientific fraud?

Chlo\'e Braud, Anders S{\o}gaard

PDF

TL;DR

This paper critically examines whether writing style can predict scientific fraud, finding that initial promising results may be overestimated and that more complex linguistic features do not improve detection, though some stylistic patterns differ.

Contribution

The study revisits prior experiments on style-based fraud detection, identifies methodological issues, and explores advanced linguistic features with limited success.

Findings

01

Simple models outperform initial proposals

02

More abstract linguistic features yield negative results

03

Certain stylistic patterns differ in fraudulent papers

Abstract

The problem of detecting scientific fraud using machine learning was recently introduced, with initial, positive results from a model taking into account various general indicators. The results seem to suggest that writing style is predictive of scientific fraud. We revisit these initial experiments, and show that the leave-one-out testing procedure they used likely leads to a slight over-estimate of the predictability, but also that simple models can outperform their proposed model by some margin. We go on to explore more abstract linguistic features, such as linguistic complexity and discourse structure, only to obtain negative results. Upon analyzing our models, we do see some interesting patterns, though: Scientific fraud, for examples, contains less comparison, as well as different types of hedging and ways of presenting logical reasoning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.