MP-CodeCheck: Evolving Logical Expression Code Anomaly Learning with Iterative Self-Supervision
Urs C. Muff, Celine Lee, Paul Gottschlich, Justin Gottschlich

TL;DR
MP-CodeCheck (MPCC) is a novel self-supervised system that efficiently detects anomalous logical code patterns, significantly improving speed and scalability over existing methods in large codebases.
Contribution
Introduces MPCC with two new programming language representations enabling exhaustive, efficient anomaly detection in billions of lines of code using self-supervision.
Findings
MPCC outperforms ControlFlag in spatial and temporal efficiency.
Successfully detects various code anomalies in open-source and proprietary repositories.
Provides qualitative insights into different classes of code anomalies.
Abstract
Machine programming (MP) is concerned with automating software development. According to studies, software engineers spend upwards of 50% of their development time debugging software. To help accelerate debugging, we present MP-CodeCheck (MPCC). MPCC is an MP system that attempts to identify anomalous code patterns within logical program expressions. In designing MPCC, we developed two novel programming language representations, the formations of which are critical in its ability to exhaustively and efficiently process the billions of lines of code that are used in its self-supervised training. To quantify MPCC's performance, we compare it against ControlFlag, a state-of-the-art self-supervised code anomaly detection system; we find that MPCC is more spatially and temporally efficient. We demonstrate MPCC's anomalous code detection capabilities by exercising it on a variety of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software System Performance and Reliability
