APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training
Xuebo Qiu, Mingqi Lv, Yimei Zhang, Tieming Chen, Tiantian Zhu, Qijie Song, Shouling Ji

TL;DR
APT-CGLP introduces a contrastive graph-language pre-training approach that enables end-to-end semantic matching between provenance graphs and CTI reports, significantly improving APT threat hunting accuracy and efficiency.
Contribution
It presents a novel end-to-end system leveraging LLMs and contrastive learning to align provenance graphs with CTI reports without manual graph extraction.
Findings
Outperforms state-of-the-art baselines in accuracy
Demonstrates effectiveness on four real-world datasets
Enhances threat hunting efficiency
Abstract
Provenance-based threat hunting identifies Advanced Persistent Threats (APTs) on endpoints by correlating attack patterns described in Cyber Threat Intelligence (CTI) with provenance graphs derived from system audit logs. A fundamental challenge in this paradigm lies in the modality gap -- the structural and semantic disconnect between provenance graphs and CTI reports. Prior work addresses this by framing threat hunting as a graph matching task: 1) extracting attack graphs from CTI reports, and 2) aligning them with provenance graphs. However, this pipeline incurs severe \textit{information loss} during graph extraction and demands intensive manual curation, undermining scalability and effectiveness. In this paper, we present APT-CGLP, a novel cross-modal APT hunting system via Contrastive Graph-Language Pre-training, facilitating end-to-end semantic matching between provenance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Information and Cyber Security · Network Security and Intrusion Detection
