Security Is Relative: Training-Free Vulnerability Detection via Multi-Agent Behavioral Contract Synthesis

Yongchao Wang; Zhiqiu Huang

arXiv:2604.19012·cs.CR·April 22, 2026

Security Is Relative: Training-Free Vulnerability Detection via Multi-Agent Behavioral Contract Synthesis

Yongchao Wang, Zhiqiu Huang

PDF

TL;DR

This paper introduces Phoenix, a training-free multi-agent framework that detects vulnerabilities by synthesizing behavioral contracts, effectively addressing semantic ambiguity and outperforming existing methods on benchmark datasets.

Contribution

Phoenix is the first training-free system that uses behavioral contract synthesis for vulnerability detection, improving accuracy and model efficiency.

Findings

01

Phoenix achieves F1=0.825 on PrimeVul Paired, surpassing prior methods.

02

Gherkin specifications significantly improve detection performance.

03

18% of false positives reveal genuine security issues in patched code.

Abstract

Deep learning for vulnerability detection has shown promising results on early benchmarks, but recent evaluations reveal catastrophic degradation: models achieving F1 > 0.68 on legacy datasets collapse to 0.031 under strict deduplication. We identify the root cause as the semantic ambiguity problem: identical code can be secure or vulnerable depending on project-specific behavioral contracts, rendering global classification fundamentally inadequate. We propose Phoenix, a training-free multi-agent framework that resolves this ambiguity through Behavioral Contract Synthesis. Phoenix decomposes detection into three stages: a Semantic Slicer extracting minimal vulnerability-relevant context, a Requirement Reverse Engineer synthesizing Gherkin behavioral specifications encoding the security contract, and a Contract Judge evaluating code against these specifications via strict compliance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.