ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs

Reza Fayyazi; Stella Hoyos Trueba; Michael Zuzak; Shanchieh Jay Yang

arXiv:2410.17406·cs.CR·January 26, 2026

ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs

Reza Fayyazi, Stella Hoyos Trueba, Michael Zuzak, Shanchieh Jay Yang

PDF

Open Access 1 Repo

TL;DR

ProveRAG is an LLM-based cybersecurity tool that enhances vulnerability analysis by integrating automated web data retrieval, self-evaluation, and verifiable evidence to improve accuracy and trustworthiness in real-time threat mitigation.

Contribution

It introduces ProveRAG, a novel system combining retrieval-augmented LLMs with self-critique and verification mechanisms for improved vulnerability analysis in cybersecurity.

Findings

01

Achieves over 99% accuracy in exploitation strategies

02

Achieves over 97% accuracy in mitigation strategies

03

Effectively cross-references data from NVD and CWE sources

Abstract

In cybersecurity, security analysts constantly face the challenge of mitigating newly discovered vulnerabilities in real-time, with over 300,000 vulnerabilities identified since 1999. The sheer volume of known vulnerabilities complicates the detection of patterns for unknown threats. While LLMs can assist, they often hallucinate and lack alignment with recent threats. Over 40,000 vulnerabilities have been identified in 2024 alone, which are introduced after most popular LLMs' (e.g., GPT-5) training data cutoff. This raises a major challenge of leveraging LLMs in cybersecurity, where accuracy and up-to-date information are paramount. Therefore, we aim to improve the adaptation of LLMs in vulnerability analysis by mimicking how an analyst performs such tasks. We propose ProveRAG, an LLM-powered system designed to assist in rapidly analyzing vulnerabilities with automated retrieval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RezzFayyazi/ProveRAG
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Scientific Computing and Data Management · Web Application Security Vulnerabilities