Proactively Detecting Threats: A Novel Approach Using LLMs
Aniesh Chawla, Udbhav Prasad

TL;DR
This paper evaluates large language models for proactively identifying cybersecurity threats from unstructured web sources, demonstrating high accuracy and recall in detecting malicious indicators of compromise.
Contribution
It introduces a systematic evaluation of LLMs for proactive threat detection, highlighting their potential to improve cybersecurity defenses over reactive methods.
Findings
Gemini 1.5 Pro achieved 0.958 precision in IOC detection
The models showed significant performance variation across sources
Gemini 1.5 Pro had perfect recall for threats
Abstract
Enterprise security faces escalating threats from sophisticated malware, compounded by expanding digital operations. This paper presents the first systematic evaluation of large language models (LLMs) to proactively identify indicators of compromise (IOCs) from unstructured web-based threat intelligence sources, distinguishing it from reactive malware detection approaches. We developed an automated system that pulls IOCs from 15 web-based threat report sources to evaluate six LLM models (Gemini, Qwen, and Llama variants). Our evaluation of 479 webpages containing 2,658 IOCs (711 IPv4 addresses, 502 IPv6 addresses, 1,445 domains) reveals significant performance variations. Gemini 1.5 Pro achieved 0.958 precision and 0.788 specificity for malicious IOC identification, while demonstrating perfect recall (1.0) for actual threats.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Information and Cyber Security · Cybercrime and Law Enforcement Studies
