Hidden Division of Labor in Scientific Teams Revealed Through 1.6 Million LaTeX Files
Jiaxin Pei, Lulin Yang, Lingfei Wu

TL;DR
This study analyzes 1.6 million LaTeX files to uncover hidden division of labor in scientific teams, revealing that authors specialize in conceptual or technical sections, challenging traditional authorship norms.
Contribution
It introduces the first large-scale dataset on author contributions derived from LaTeX macros, validating its reliability and uncovering implicit labor division in scientific collaborations.
Findings
Authors tend to specialize in conceptual or technical sections.
The dataset is validated with high precision against self-reports and norms.
Reveals implicit division of labor challenging conventional authorship practices.
Abstract
Recognition of individual contributions is fundamental to the scientific reward system, yet coauthored papers obscure who did what. Traditional proxies-author order and career stage-reinforce biases, while contribution statements remain self-reported and limited to select journals. We construct the first large-scale dataset on writing contributions by analyzing author-specific macros in LaTeX files from 1.6 million papers (1991-2023) by 2 million scientists. Validation against self-reported statements (precision = 0.87), author order patterns, field-specific norms, and Overleaf records (Spearman's rho = 0.6, p < 0.05) confirms the reliability of the created data. Using explicit section information, we reveal a hidden division of labor within scientific teams: some authors primarily contribute to conceptual sections (e.g., Introduction and Discussion), while others focus on technical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices
MethodsFocus
