Comparing the writing style of real and artificial papers

Diego R. Amancio

arXiv:1506.05702·cs.CL·February 22, 2016·22 cites

Comparing the writing style of real and artificial papers

Diego R. Amancio

PDF

Open Access

TL;DR

This paper presents a network-based methodology to distinguish real scientific papers from artificially generated ones with high accuracy, addressing the issue of scientific fraud and paper authenticity.

Contribution

It introduces a novel approach using complex network features to effectively identify fake papers generated by software like SCIGen.

Findings

01

Achieved at least 89% accuracy in classification

02

Identified key network features like accessibility and betweenness

03

Demonstrated the potential of combining network analysis with traditional methods

Abstract

Recent years have witnessed the increase of competition in science. While promoting the quality of research in many cases, an intense competition among scientists can also trigger unethical scientific behaviors. To increase the total number of published papers, some authors even resort to software tools that are able to produce grammatical, but meaningless scientific manuscripts. Because automatically generated papers can be misunderstood as real papers, it becomes of paramount importance to develop means to identify these scientific frauds. In this paper, I devise a methodology to distinguish real manuscripts from those generated with SCIGen, an automatic paper generator. Upon modeling texts as complex networks (CN), it was possible to discriminate real from fake papers with at least 89\% of accuracy. A systematic analysis of features relevance revealed that the accessibility and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Biomedical Text Mining and Ontologies