FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents

Vikram Nitin; Baishakhi Ray; Roshanak Zilouchian Moghaddam

arXiv:2507.15241·cs.SE·July 22, 2025

FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents

Vikram Nitin, Baishakhi Ray, Roshanak Zilouchian Moghaddam

PDF

TL;DR

FaultLine is an innovative LLM-based workflow that automatically generates proof-of-vulnerability tests by reasoning about control flow and data paths, improving over existing methods across multiple programming languages.

Contribution

This paper introduces FaultLine, a novel LLM agent framework that automates PoV test generation without language-specific static or dynamic analysis, achieving significant performance improvements.

Findings

01

FaultLine successfully generated PoV tests for 16 out of 100 vulnerabilities.

02

It outperformed the state-of-the-art CodeAct 2.1 by 77%.

03

Hierarchical reasoning enhances LLM performance in vulnerability testing.

Abstract

Despite the critical threat posed by software security vulnerabilities, reports are often incomplete, lacking the proof-of-vulnerability (PoV) tests needed to validate fixes and prevent regressions. These tests are crucial not only for ensuring patches work, but also for helping developers understand how vulnerabilities can be exploited. Generating PoV tests is a challenging problem, requiring reasoning about the flow of control and data through deeply nested levels of a program. We present FaultLine, an LLM agent workflow that uses a set of carefully designed reasoning steps, inspired by aspects of traditional static and dynamic program analysis, to automatically generate PoV test cases. Given a software project with an accompanying vulnerability report, FaultLine 1) traces the flow of an input from an externally accessible API ("source") to the "sink" corresponding to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.