Scienceography: the study of how science is written
Graham Cormode, S. Muthukrishnan, Jinyun Yan

TL;DR
This paper introduces 'scienceography', the study of how scientific papers are written, using arXiv LaTeX source data to analyze writing patterns, tools, and differences across fields like computer science and mathematics.
Contribution
It pioneers the study of scienceography by analyzing LaTeX source data to uncover writing patterns and differences across scientific disciplines.
Findings
Identified broad writing patterns in computer science and mathematics.
Highlighted differences in writing styles between fields.
Demonstrated the utility of LaTeX source data for scientific writing analysis.
Abstract
Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of 'scienceography', which focuses on the writing of science. We provide a first large scale study using data derived from the arXiv e-print repository. Crucially, our data includes the "source code" of scientific papers-the LaTEX source-which enables us to study features not present in the "final product", such as the tools used and private comments between authors. Our study identifies broad patterns and trends in two example areas-computer science and mathematics-as well as highlighting key differences in the way that science is written…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Semantic Web and Ontologies · Web Data Mining and Analysis
