Reproducibility in Event-Log Research: A Parametrised Generator and Benchmark for Event-based Signatures

Saad Khan; Simon Parkinson; Monika Roopak

arXiv:2601.12978·cs.CR·January 21, 2026

Reproducibility in Event-Log Research: A Parametrised Generator and Benchmark for Event-based Signatures

Saad Khan, Simon Parkinson, Monika Roopak

PDF

Open Access

TL;DR

This paper introduces a parametrised generator for synthetic event logs with known signatures, facilitating reproducible benchmarking of signature detection methods in cybersecurity.

Contribution

It presents a novel parametrised generation technique for synthetic event datasets with ground truth signatures, enabling systematic evaluation and comparison of detection approaches.

Findings

01

DBSCAN achieved over 0.95 Adjusted Rand Index on most datasets

02

The generator produces realistic, signature-containing datasets for benchmarking

03

Enhances reproducibility and comparability in cybersecurity research

Abstract

Event-based datasets are crucial for cybersecurity analysis. A key use case is detecting event-based signatures, which represent attacks spanning multiple events and can only be understood once the relevant events are identified and linked. Analysing event datasets is essential for monitoring system security, but their growing volume and frequency create significant scalability and processing difficulties. Researchers rely on these datasets to develop and test techniques for automatically identifying signatures. However, because real datasets are security-sensitive and rarely shared, it becomes difficult to perform meaningful comparative evaluation between different approaches. This work addresses this evaluation limitation by offering a systematic method for generating event logs with known ground truth, enabling reproducible and comparable research. We present a novel parametrised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Data Quality and Management · Network Security and Intrusion Detection