Longitudinal Analyses of SAST Tools: A CodeQL Case Study
Jean-Charles Noirot Ferrand, Kyle Domico, Yohan Beugin, Patrick McDaniel

TL;DR
This study evaluates the effectiveness and stability of CodeQL, a static analysis tool, over time across numerous open-source repositories, revealing its strengths and limitations in vulnerability detection.
Contribution
Introduces a novel longitudinal evaluation method for SAST tools and provides the largest academic analysis of CodeQL's performance on OSS codebases.
Findings
CodeQL detected 171 CVEs, with 83 detectable before fixes.
Detection accuracy varies across versions, with some vulnerabilities no longer detected after updates.
Findings are often actionable when triaged within vulnerable files.
Abstract
Open-source software (OSS) pipelines rely on automated static analysis tools to prevent the introduction of vulnerabilities in code. However, there is limited understanding of the efficacy of these tools across the OSS ecosystem over time. In this paper, we introduce a novel method to evaluate static application security testing (SAST) tools through longitudinal measurements and perform the largest academic study of CodeQL -- the most prevalent static analysis tool from GitHub -- on OSS codebases. We apply our apparatus on 114 versions of CodeQL over time on 3993 CVEs from 1622 repositories to measure key properties of the tool, culminating in more than 20 billion lines of code analyzed. First, we measure its effectiveness, i.e., its ability to detect vulnerabilities before they are fixed. Then, we determine whether these detections were actionable through two measures of the distance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
