CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis

Arunabh Majumdar

arXiv:2604.21917·cs.CR·April 24, 2026

CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis

Arunabh Majumdar

PDF

TL;DR

CrossCommitVuln-Bench introduces a benchmark dataset of 15 real-world Python vulnerabilities spanning multiple commits, revealing significant limitations of static analysis tools in detecting multi-commit vulnerabilities.

Contribution

The paper provides the first curated benchmark dataset and annotation schema for multi-commit Python vulnerabilities, along with baseline evaluations highlighting detection challenges.

Findings

01

Per-commit detection rate is only 13%, with 87% of chains invisible to static analysis.

02

Cumulative mode detection rate is only 27%, indicating many vulnerabilities are missed across commits.

03

Baseline tools perform poorly, especially on vulnerabilities introduced over multiple commits.

Abstract

We present CrossCommitVuln-Bench, a curated benchmark of 15 real-world Python vulnerabilities (CVEs) in which the exploitable condition was introduced across multiple commits - each individually benign to per-commit static analysis - but collectively critical. We manually annotate each CVE with its contributing commit chain, a structured rationale for why each commit evades per-commit analysis, and baseline evaluations using Semgrep and Bandit in both per-commit and cumulative scanning modes. Our central finding: the per-commit detection rate (CCDR) is 13% across all 15 vulnerabilities - 87% of chains are invisible to per-commit SAST. Critically, both per-commit detections are qualitatively poor: one occurs on commits framed as security fixes (where developers suppress the alert), and the other detects only the minor hardcoded-key component while completely missing the primary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.