LMDG: Advancing Lateral Movement Detection Through High-Fidelity Dataset Generation
Anas Mabrouk, Mohamed Hatem, Mohammad Mamun, Sherif Saad

TL;DR
This paper introduces LMDG, a framework for generating high-fidelity, well-labeled datasets for lateral movement attack detection, enabling more accurate evaluation of security systems through realistic multi-stage attack data.
Contribution
LMDG provides a reproducible, extensible method for creating detailed, labeled datasets with a novel Process Tree Labeling technique for precise attack step attribution.
Findings
Generated a 25-day enterprise dataset with 35 multi-stage LM attacks
Achieved high-precision labeling of malicious activities
Produced a realistic dataset with less than 1% malicious activity
Abstract
Lateral Movement (LM) attacks continue to pose a significant threat to enterprise security, enabling adversaries to stealthily compromise critical assets. However, the development and evaluation of LM detection systems are impeded by the absence of realistic, well-labeled datasets. To address this gap, we propose LMDG, a reproducible and extensible framework for generating high-fidelity LM datasets. LMDG automates benign activity generation, multi-stage attack execution, and comprehensive labeling of system and network logs, dramatically reducing manual effort and enabling scalable dataset creation. A central contribution of LMDG is Process Tree Labeling, a novel agent-based technique that traces all malicious activity back to its origin with high precision. Unlike prior methods such as Injection Timing or Behavioral Profiling, Process Tree Labeling enables accurate, step-wise labeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
