Revealing the True Indicators: Understanding and Improving IoC Extraction From Threat Reports
Evangelos Froudakis, Athanasios Avgetidis, Sean Tyler Frankum, Roberto Perdisci, Manos Antonakakis, Angelos D. Keromytis

TL;DR
This paper introduces a hybrid human-in-the-loop pipeline combining AI and expert validation to improve IoC extraction accuracy, reduce false positives, and create a high-quality benchmark for threat intelligence research.
Contribution
It presents the first hybrid pipeline for IoC extraction that enhances precision and efficiency, and provides PRISM, a validated benchmark dataset for future research.
Findings
Reduces analysts' work factor by 43%
Produces a high-quality benchmark of 1,791 IoCs
Improves extraction precision with explainable, context-aware labeling
Abstract
Indicators of Compromise (IoCs) are critical for threat detection and response, marking malicious activity across networks and systems. Yet, the effectiveness of automated IoC extraction systems is fundamentally limited by one key issue: the lack of high-quality ground truth. Current extraction tools rely either on manually extracted ground truth, which is labor-intensive and costly, or on automated ground truth creation methods that include non-malicious artifacts, leading to inflated false positive (FP) rates and unreliable threat intelligence. In this work, we analyze the shortcomings of existing ground truth creation strategies and address them by introducing the first hybrid human-in-the-loop pipeline for IoC extraction, which combines a large language model-based classifier (LANCE) with expert analyst validation. Our system improves precision through explainable, context-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Network Security and Intrusion Detection · Advanced Malware Detection Techniques
