LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis

Reza Fayyazi; Michael Zuzak; Shanchieh Jay Yang

arXiv:2506.12100·cs.CR·September 4, 2025

LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis

Reza Fayyazi, Michael Zuzak, Shanchieh Jay Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LEA, a method to quantify how much internal knowledge versus retrieved information influences LLM responses in cybersecurity, enhancing trust and safety in vulnerability analysis.

Contribution

LEA provides a novel way to analyze and attribute the sources of information in LLM-generated responses, improving transparency in security applications.

Findings

01

LEA achieves over 95% accuracy in distinguishing retrieval scenarios.

02

It reveals limitations of incorrect retrieval in vulnerability analysis.

03

LEA aids security analysts in auditing LLM responses.

Abstract

Large Language Models (LLMs) are increasingly used for cybersecurity threat analysis, but their deployment in security-sensitive environments raises trust and safety concerns. With over 21,000 vulnerabilities disclosed in 2025, manual analysis is infeasible, making scalable and verifiable AI support critical. When querying LLMs, dealing with emerging vulnerabilities is challenging as they have a training cut-off date. While Retrieval-Augmented Generation (RAG) can inject up-to-date context to alleviate the cut-off date limitation, it remains unclear how much LLMs rely on retrieved evidence versus the model's internal knowledge, and whether the retrieved information is meaningful or even correct. This uncertainty could mislead security analysts, mis-prioritize patches, and increase security risks. Therefore, this work proposes LLM Embedding-based Attribution (LEA) to analyze the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rezzfayyazi/lea
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security