Security Is Relative: Training-Free Vulnerability Detection via Multi-Agent Behavioral Contract Synthesis
Yongchao Wang, Zhiqiu Huang

TL;DR
This paper introduces Phoenix, a training-free multi-agent framework that detects vulnerabilities by synthesizing behavioral contracts, effectively addressing semantic ambiguity and outperforming existing methods on benchmark datasets.
Contribution
Phoenix is the first training-free system that uses behavioral contract synthesis for vulnerability detection, improving accuracy and model efficiency.
Findings
Phoenix achieves F1=0.825 on PrimeVul Paired, surpassing prior methods.
Gherkin specifications significantly improve detection performance.
18% of false positives reveal genuine security issues in patched code.
Abstract
Deep learning for vulnerability detection has shown promising results on early benchmarks, but recent evaluations reveal catastrophic degradation: models achieving F1 > 0.68 on legacy datasets collapse to 0.031 under strict deduplication. We identify the root cause as the semantic ambiguity problem: identical code can be secure or vulnerable depending on project-specific behavioral contracts, rendering global classification fundamentally inadequate. We propose Phoenix, a training-free multi-agent framework that resolves this ambiguity through Behavioral Contract Synthesis. Phoenix decomposes detection into three stages: a Semantic Slicer extracting minimal vulnerability-relevant context, a Requirement Reverse Engineer synthesizing Gherkin behavioral specifications encoding the security contract, and a Contract Judge evaluating code against these specifications via strict compliance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
