Semantic-Aware Advanced Persistent Threat Detection Using Autoencoders on LLM-Encoded System Logs

Waleed Khan Mohammed; Zahirul Arief Irfan Bin Shahrul Anuar; Mousa Sufian Mousa Mitani; Hezerul Abdul Karim; Nouar AlDahoul

arXiv:2602.00204·cs.CR·February 3, 2026

Semantic-Aware Advanced Persistent Threat Detection Using Autoencoders on LLM-Encoded System Logs

Waleed Khan Mohammed, Zahirul Arief Irfan Bin Shahrul Anuar, Mousa Sufian Mousa Mitani, Hezerul Abdul Karim, Nouar AlDahoul

PDF

Open Access

TL;DR

This paper introduces a novel APT detection method that uses semantic embeddings from Large Language Models and Autoencoders to improve detection accuracy on system logs, outperforming traditional techniques.

Contribution

The paper presents a new approach combining LLM-generated semantic embeddings with Autoencoders for enhanced APT detection in system logs, demonstrating superior performance over existing methods.

Findings

01

Outperforms baseline unsupervised methods in AUC-ROC scores.

02

Effective in detecting stealthy, low-and-slow APT behaviors.

03

Highlights the importance of semantic understanding in cyberattack detection.

Abstract

Advanced Persistent Threats (APTs) are among the most challenging cyberattacks to detect. They are carried out by highly skilled attackers who carefully study their targets and operate in a stealthy, long-term manner. Because APTs exhibit "low-and-slow" behavior, traditional statistical methods and shallow machine learning techniques often fail to detect them. Previous research on APT detection has explored machine learning approaches and provenance graph analysis. However, provenance-based methods often fail to capture the semantic intent behind system activities. This paper proposes a novel anomaly detection approach that leverages semantic embeddings generated by Large Language Models (LLMs). The method enhances APT detection by extracting meaningful semantic representations from unstructured system log data. First, raw system logs are transformed into high-dimensional semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Software System Performance and Reliability · Information and Cyber Security