ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs
Reza Fayyazi, Stella Hoyos Trueba, Michael Zuzak, Shanchieh Jay Yang

TL;DR
ProveRAG is an LLM-based cybersecurity tool that enhances vulnerability analysis by integrating automated web data retrieval, self-evaluation, and verifiable evidence to improve accuracy and trustworthiness in real-time threat mitigation.
Contribution
It introduces ProveRAG, a novel system combining retrieval-augmented LLMs with self-critique and verification mechanisms for improved vulnerability analysis in cybersecurity.
Findings
Achieves over 99% accuracy in exploitation strategies
Achieves over 97% accuracy in mitigation strategies
Effectively cross-references data from NVD and CWE sources
Abstract
In cybersecurity, security analysts constantly face the challenge of mitigating newly discovered vulnerabilities in real-time, with over 300,000 vulnerabilities identified since 1999. The sheer volume of known vulnerabilities complicates the detection of patterns for unknown threats. While LLMs can assist, they often hallucinate and lack alignment with recent threats. Over 40,000 vulnerabilities have been identified in 2024 alone, which are introduced after most popular LLMs' (e.g., GPT-5) training data cutoff. This raises a major challenge of leveraging LLMs in cybersecurity, where accuracy and up-to-date information are paramount. Therefore, we aim to improve the adaptation of LLMs in vulnerability analysis by mimicking how an analyst performs such tasks. We propose ProveRAG, an LLM-powered system designed to assist in rapidly analyzing vulnerabilities with automated retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Web Application Security Vulnerabilities
