Structured Extraction of Vulnerabilities in OpenVAS and Tenable WAS Reports Using LLMs
Beatriz Machado, Douglas Lautert, Cristhian Kapelinski, Diego Kreutz

TL;DR
This paper presents an automated method using large language models to extract and structure vulnerabilities from OpenVAS and Tenable WAS reports, transforming unstructured data into standardized formats for improved risk management.
Contribution
It introduces a novel LLM-based approach for extracting vulnerabilities from scanner reports, demonstrating high accuracy and potential for practical risk assessment applications.
Findings
GPT-4.1 and DeepSeek achieved ROUGE-L > 0.7
Method effectively converts complex reports into structured data
Enables vulnerability prioritization and data anonymization
Abstract
This paper proposes an automated LLM-based method to extract and structure vulnerabilities from OpenVAS and Tenable WAS scanner reports, converting unstructured data into a standardized format for risk management. In an evaluation using a report with 34 vulnerabilities, GPT-4.1 and DeepSeek achieved the highest similarity to the baseline (ROUGE-L greater than 0.7). The method demonstrates feasibility in transforming complex reports into usable datasets, enabling effective prioritization and future anonymization of sensitive data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Information and Cyber Security · Digital and Cyber Forensics
