Revealing the True Indicators: Understanding and Improving IoC Extraction From Threat Reports

Evangelos Froudakis; Athanasios Avgetidis; Sean Tyler Frankum; Roberto Perdisci; Manos Antonakakis; Angelos D. Keromytis

arXiv:2506.11325·cs.CR·October 27, 2025

Revealing the True Indicators: Understanding and Improving IoC Extraction From Threat Reports

Evangelos Froudakis, Athanasios Avgetidis, Sean Tyler Frankum, Roberto Perdisci, Manos Antonakakis, Angelos D. Keromytis

PDF

Open Access

TL;DR

This paper introduces a hybrid human-in-the-loop pipeline combining AI and expert validation to improve IoC extraction accuracy, reduce false positives, and create a high-quality benchmark for threat intelligence research.

Contribution

It presents the first hybrid pipeline for IoC extraction that enhances precision and efficiency, and provides PRISM, a validated benchmark dataset for future research.

Findings

01

Reduces analysts' work factor by 43%

02

Produces a high-quality benchmark of 1,791 IoCs

03

Improves extraction precision with explainable, context-aware labeling

Abstract

Indicators of Compromise (IoCs) are critical for threat detection and response, marking malicious activity across networks and systems. Yet, the effectiveness of automated IoC extraction systems is fundamentally limited by one key issue: the lack of high-quality ground truth. Current extraction tools rely either on manually extracted ground truth, which is labor-intensive and costly, or on automated ground truth creation methods that include non-malicious artifacts, leading to inflated false positive (FP) rates and unreliable threat intelligence. In this work, we analyze the shortcomings of existing ground truth creation strategies and address them by introducing the first hybrid human-in-the-loop pipeline for IoC extraction, which combines a large language model-based classifier (LANCE) with expert analyst validation. Our system improves precision through explainable, context-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Network Security and Intrusion Detection · Advanced Malware Detection Techniques