Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations
Ziyin Zhou, Jianyi Zhang, Xu ji, Yilong Li, Jiameng Han, Zhangchi Zhao

TL;DR
This paper introduces CRVA-TGRAG, a two-stage framework combining improved retrieval techniques and teacher-guided fine-tuning to enhance LLMs' ability to accurately analyze and update cybersecurity vulnerabilities.
Contribution
The paper presents a novel framework that improves vulnerability knowledge retrieval and consistency in LLMs through document segmentation, ensemble retrieval, and preference-based fine-tuning.
Findings
Achieves higher accuracy in retrieving latest CVEs.
Reduces knowledge conflicts and hallucinations in LLM outputs.
Enhances content quality and question-answering precision.
Abstract
Large Language Models (LLMs) are essential for analyzing and addressing vulnerabilities in cybersecurity. However, among over 200,000 vulnerabilities were discovered in the past decade, more than 30,000 have been changed or updated. This necessitates frequent updates to the training datasets and internal knowledge bases of LLMs to maintain knowledge consistency. In this paper, we focus on the problem of knowledge discrepancy and conflict within CVE (Common Vulnerabilities and Exposures) detection and analysis. This problem hinders LLMs' ability to retrieve the latest knowledge from original training datasets, leading to knowledge conflicts, fabrications of factually incorrect results, and generation hallucinations. To address this problem, we propose an innovative two-stage framework called CRVA-TGRAG (Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
