LMDG: Advancing Lateral Movement Detection Through High-Fidelity Dataset Generation

Anas Mabrouk; Mohamed Hatem; Mohammad Mamun; Sherif Saad

arXiv:2508.02942·cs.CR·August 6, 2025

LMDG: Advancing Lateral Movement Detection Through High-Fidelity Dataset Generation

Anas Mabrouk, Mohamed Hatem, Mohammad Mamun, Sherif Saad

PDF

TL;DR

This paper introduces LMDG, a framework for generating high-fidelity, well-labeled datasets for lateral movement attack detection, enabling more accurate evaluation of security systems through realistic multi-stage attack data.

Contribution

LMDG provides a reproducible, extensible method for creating detailed, labeled datasets with a novel Process Tree Labeling technique for precise attack step attribution.

Findings

01

Generated a 25-day enterprise dataset with 35 multi-stage LM attacks

02

Achieved high-precision labeling of malicious activities

03

Produced a realistic dataset with less than 1% malicious activity

Abstract

Lateral Movement (LM) attacks continue to pose a significant threat to enterprise security, enabling adversaries to stealthily compromise critical assets. However, the development and evaluation of LM detection systems are impeded by the absence of realistic, well-labeled datasets. To address this gap, we propose LMDG, a reproducible and extensible framework for generating high-fidelity LM datasets. LMDG automates benign activity generation, multi-stage attack execution, and comprehensive labeling of system and network logs, dramatically reducing manual effort and enabling scalable dataset creation. A central contribution of LMDG is Process Tree Labeling, a novel agent-based technique that traces all malicious activity back to its origin with high precision. Unlike prior methods such as Injection Timing or Behavioral Profiling, Process Tree Labeling enables accurate, step-wise labeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.