Bridging Behavioral Biometrics and Source Code Stylometry: A Survey of Programmer Attribution
Marek Horvath, Emilia Pietrikova, Diomidis Spinellis

TL;DR
This survey reviews the landscape of programmer attribution research using source code analysis, highlighting trends, methodologies, and gaps in the field to guide future investigations.
Contribution
It provides a comprehensive taxonomy and analysis of existing studies, datasets, and techniques in source code-based programmer attribution research.
Findings
Focus on stylometric features in closed-world attribution
Heavy reliance on small benchmark datasets
Behavioral signals and reproducibility are underexplored
Abstract
Programmer attribution seeks to identify or verify the author of a source code artifact using stylistic, structural, or behavioural characteristics. This problem has been studied across software engineering, security, and digital forensics, resulting in a growing and methodologically diverse set of publications. This paper presents a systematic mapping study of programmer attribution research focused on source code analysis. From an initial set of 135 candidate publications, 47 studies published between 2012 and 2025 were selected through a structured screening process. The included works are analysed along several dimensions, including authorship tasks, feature categories, learning and modelling approaches, dataset sources, and evaluation practices. Based on this analysis, we derive a taxonomy that relates stylistic and behavioural feature types to commonly used machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Software Engineering Research · Academic integrity and plagiarism
