# LLM-driven Provenance Forensics for Threat Investigation and Detection

**Authors:** Kunal Mukherjee, Murat Kantarcioglu

arXiv: 2508.21323 · 2025-11-18

## TL;DR

PROVSEEK is an innovative LLM-powered framework that automates provenance-based forensic analysis and threat detection, significantly improving accuracy and scalability in threat investigation tasks without prior environment knowledge.

## Contribution

The paper introduces PROVSEEK, a novel agentic framework combining LLMs, toolchains, and reasoning techniques for automated, scalable forensic analysis and threat detection.

## Key findings

- Outperforms retrieval-based methods with 34% better precision/recall
- Achieves 22-29% higher precision/recall than baseline and SOTA systems
- Increases token usage and latency moderately with larger databases

## Abstract

We introduce PROVSEEK, an LLM-powered agentic framework for automated provenance-driven forensic analysis and threat intelligence extraction. PROVSEEK employs specialized toolchains to dynamically retrieve relevant context by generating precise, context-aware queries that fuse knowledge from threat reports with evidence from system provenance data. The framework resolves provenance queries, orchestrates multiple role-specific agents, and synthesizes structured, ground-truth verifiable forensic summaries. By combining agent orchestration with Retrieval-Augmented Generation (RAG) and chain-of-thought (CoT) reasoning, data-guided filtration using a behavioral model, PROVSEEK enables adaptive multi-step analysis that iteratively refines hypotheses, verifies supporting evidence, and produces scalable, interpretable forensic explanations of attack behaviors. PROVSEEK is designed for automated threat investigation without task-specific training data, enabling forensic-style investigation even when no prior knowledge of the environment. We conduct a comprehensive evaluation on publicly available DARPA datasets, demonstrating that PROVSEEK outperforms retrieval-based methods for the intelligence extraction task, achieving a 34% improvement in contextual precision/recall; and for threat detection task, PROVSEEK achieves 22%/29% higher precision/recall compared to both a baseline agent approach and State-Of-The-Art (SOTA) Provenance-based Intrusion Detection System (PIDS). In our scalability study, we show that PROVSEEK increases token usage by 1.42x and latency by 1.63x as the database size increases 50x, making it optimal for large-scale deployment. We also conducted an ablation and error analysis study to show how different components of PROVSEEK affect the detection performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21323/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21323/full.md

---
Source: https://tomesphere.com/paper/2508.21323