Anomalicious: Automated Detection of Anomalous and Potentially Malicious   Commits on GitHub

Danielle Gonzalez; Thomas Zimmermann; Patrice Godefroid; Max Schaefer

arXiv:2103.03846·cs.SE·March 11, 2021

Anomalicious: Automated Detection of Anomalous and Potentially Malicious Commits on GitHub

Danielle Gonzalez, Thomas Zimmermann, Patrice Godefroid, Max Schaefer

PDF

TL;DR

Anomalicious is a tool that uses commit logs and repository metadata to automatically detect malicious and anomalous commits on GitHub, enhancing security in open source software development.

Contribution

This work introduces a novel rule-based system that leverages commit and repository metadata to identify malicious contributions without needing code analysis.

Findings

01

Detected 53.33% of malicious commits in infected repositories.

02

Flagged less than 1% of commits as suspicious in most cases.

03

Identified non-malicious anomalies in repositories without known threats.

Abstract

Security is critical to the adoption of open source software (OSS), yet few automated solutions currently exist to help detect and prevent malicious contributions from infecting open source repositories. On GitHub, a primary host of OSS, repositories contain not only code but also a wealth of commit-related and contextual metadata - what if this metadata could be used to automatically identify malicious OSS contributions? In this work, we show how to use only commit logs and repository metadata to automatically detect anomalous and potentially malicious commits. We identify and evaluate several relevant factors which can be automatically computed from this data, such as the modification of sensitive files, outlier change properties, or a lack of trust in the commit's author. Our tool, Anomalicious, automatically computes these factors and considers them holistically using a rule-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.