Information-Theoretic Detection of Unusual Source Code Changes
Adriano Torres, Sebastian Baltes, Christoph Treude, and Markus Wagner

TL;DR
This paper introduces an information-theoretic approach to measure source code evolution and detect unusual changes, using entropy of tokens and syntax trees, showing promising results in anomaly detection.
Contribution
It presents a novel entropy-based method for analyzing code complexity and identifying unusual changes, expanding beyond traditional metrics.
Findings
Entropy correlates with code complexity measures.
Entropy-based anomaly detection achieves over 60% precision.
Entropy captures different complexity dimensions than classic metrics.
Abstract
The code base of software projects evolves essentially through inserting and removing information to and from the source code. We can measure this evolution via the elements of information - tokens, words, nodes - of the respective representation of the code. In this work, we approach the measurement of the information content of the source code of open-source projects from an information-theoretic standpoint. Our focus is on the entropy of two fundamental representations of code: tokens and abstract syntax tree nodes, from which we derive definitions of textual and structural entropy. We proceed with an empirical assessment where we evaluate the evolution patterns of the entropy of 95 actively maintained open source projects. We calculate the statistical relationships between our derived entropy metrics and classic methods of measuring code complexity and learn that entropy may capture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Open Source Software Innovations · Software Engineering Techniques and Practices
