Towards a better labeling process for network security datasets

Sebastian Garcia; Veronica Valeros

arXiv:2305.01337·cs.CR·May 3, 2023·2 cites

Towards a better labeling process for network security datasets

Sebastian Garcia, Veronica Valeros

PDF

Open Access

TL;DR

This paper proposes a new ontology and tool for structured labeling of network security datasets, aiming to improve dataset quality, evaluation, and real-world applicability in network security research.

Contribution

It introduces a comprehensive label assignment ontology and a tool for Zeek network flows, enhancing labeling consistency and usefulness across stakeholders.

Findings

01

Structured label assignment improves dataset evaluation.

02

Ontology-based labels facilitate better model training.

03

Implementing the process benefits real-life security scenarios.

Abstract

Most network security datasets do not have comprehensive label assignment criteria, hindering the evaluation of the datasets, the training of models, the results obtained, the comparison with other methods, and the evaluation in real-life scenarios. There is no labeling ontology nor tools to help assign the labels, resulting in most analyzed datasets assigning labels in files or directory names. This paper addresses the problem of having a better labeling process by (i) reviewing the needs of stakeholders of the datasets, from creators to model users, (ii) presenting a new ontology of label assignment, (iii) presenting a new tool for assigning structured labels for Zeek network flows based on the ontology, and (iv) studying the differences between generating labels and consuming labels in real-life scenarios. We conclude that a process for structured label assignment is paramount for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Spam and Phishing Detection