Structured Extraction of Vulnerabilities in OpenVAS and Tenable WAS Reports Using LLMs

Beatriz Machado; Douglas Lautert; Cristhian Kapelinski; Diego Kreutz

arXiv:2511.15745·cs.CR·November 21, 2025

Structured Extraction of Vulnerabilities in OpenVAS and Tenable WAS Reports Using LLMs

Beatriz Machado, Douglas Lautert, Cristhian Kapelinski, Diego Kreutz

PDF

Open Access

TL;DR

This paper presents an automated method using large language models to extract and structure vulnerabilities from OpenVAS and Tenable WAS reports, transforming unstructured data into standardized formats for improved risk management.

Contribution

It introduces a novel LLM-based approach for extracting vulnerabilities from scanner reports, demonstrating high accuracy and potential for practical risk assessment applications.

Findings

01

GPT-4.1 and DeepSeek achieved ROUGE-L > 0.7

02

Method effectively converts complex reports into structured data

03

Enables vulnerability prioritization and data anonymization

Abstract

This paper proposes an automated LLM-based method to extract and structure vulnerabilities from OpenVAS and Tenable WAS scanner reports, converting unstructured data into a standardized format for risk management. In an evaluation using a report with 34 vulnerabilities, GPT-4.1 and DeepSeek achieved the highest similarity to the baseline (ROUGE-L greater than 0.7). The method demonstrates feasibility in transforming complex reports into usable datasets, enabling effective prioritization and future anonymization of sensitive data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Information and Cyber Security · Digital and Cyber Forensics