Next-generation cyberattack detection with large language models: anomaly analysis across heterogeneous logs
Yassine Chagna, Antal Goldschmidt

TL;DR
This paper introduces a new approach using large language models for anomaly detection in heterogeneous logs, addressing challenges like false positives and data privacy, with datasets, benchmarking, and a two-phase training framework.
Contribution
It presents new heterogeneous log datasets, critiques standard metrics for security, and proposes a two-phase LLM training framework for real-time anomaly detection.
Findings
Benchmarking shows standard metrics are misleading for security tasks.
The two-phase training achieves inference times of 0.3-0.5 seconds per session.
Operational costs are maintained below 50 USD per day.
Abstract
This project explores large language models (LLMs) for anomaly detection across heterogeneous log sources. Traditional intrusion detection systems suffer from high false positive rates, semantic blindness, and data scarcity, as logs are inherently sensitive, making clean datasets rare. We address these challenges through three contributions: (1) LogAtlas-Foundation-Sessions and LogAtlas-Defense-Set, balanced and heterogeneous log datasets with explicit attack annotations and privacy preservation; (2) empirical benchmarking revealing why standard metrics such as F1 and accuracy are misleading for security applications; and (3) a two phase training framework combining log understanding (Base-AMAN, 3B parameters) with real time detection (AMAN, 0.5B parameters via knowledge distillation). Results demonstrate practical feasibility, with inference times of 0.3-0.5 seconds per session and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Software System Performance and Reliability · Information and Cyber Security
