Benchmarking LLMs in an Embodied Environment for Blue Team Threat Hunting
Xiaoqun Liu, Feiyang Yu, Xi Li, Guanhua Yan, Ping Yang, Zhaohan Xi

TL;DR
This paper introduces CYBERTEAM, a benchmark environment that structures threat-hunting workflows into modular tasks and functions to evaluate and enhance LLMs' effectiveness in blue team cybersecurity operations.
Contribution
It presents a novel embodied environment and benchmark for assessing LLMs in structured threat-hunting workflows, integrating multiple tasks and functions.
Findings
LLMs show promising capabilities but have limitations in complex threat-hunting tasks.
CYBERTEAM outperforms basic elicitation strategies in guiding LLMs.
Structured function-driven approaches improve threat analysis accuracy.
Abstract
As cyber threats continue to grow in scale and sophistication, blue team defenders increasingly require advanced tools to proactively detect and mitigate risks. Large Language Models (LLMs) offer promising capabilities for enhancing threat analysis. However, their effectiveness in real-world blue team threat-hunting scenarios remains insufficiently explored. In this paper, we present CYBERTEAM, a benchmark designed to guide LLMs in blue teaming practice. CYBERTEAM constructs an embodied environment in two stages. First, it models realistic threat-hunting workflows by capturing the dependencies among analytical tasks from threat attribution to incident response. Next, each task is addressed through a set of embodied functions tailored to its specific analytical requirements. This transforms the overall threat-hunting process into a structured sequence of function-driven operations, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Network Security and Intrusion Detection · Advanced Malware Detection Techniques
