An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation
Kushankur Ghosh, Mehar Klair, Kian Kyars, Euijin Choo, J\"org Sander

TL;DR
Auto-Prov is an innovative framework that uses large language models to automatically build enriched provenance graphs from logs, improving attack detection and providing interpretable summaries to aid analysts.
Contribution
The paper introduces Auto-Prov, a novel end-to-end system that automates provenance graph construction, embeds functional context, and enhances attack detection and explanation using LLMs.
Findings
Auto-Prov improves detection accuracy across multiple detectors.
It generalizes well to diverse and evolving log formats.
Produces stable, natural-language attack summaries.
Abstract
Provenance graphs model causal system-level interactions from logs, enabling anomaly detectors to learn normal behavior and detect deviations as attacks. However, existing approaches rely on brittle, manually engineered rules to build provenance graphs, lack functional context for system entities, and provide limited support for analyst investigation. We present Auto-Prov, an adaptive, end-to-end framework that leverages large language models (LLMs) to automatically construct provenance graphs from heterogeneous and evolving logs, embed system-level functional attributes into the graph, enable provenance graph-based anomaly detectors to learn from these enriched graphs, and summarize the detected attacks to assist an analyst's investigation. Auto-Prov clusters unseen log types and efficiently extracts provenance edges and entity-level information via automatically generated rules. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software System Performance and Reliability · Advanced Graph Neural Networks
