Datasets are not Enough: Challenges in Labeling Network Traffic

Jorge Guerra; Carlos Catania; Eduardo Veas

arXiv:2110.05977·cs.CR·January 3, 2022·1 cites

Datasets are not Enough: Challenges in Labeling Network Traffic

Jorge Guerra, Carlos Catania, Eduardo Veas

PDF

Open Access

TL;DR

This paper critically analyzes current methodologies for labeling network traffic datasets, highlighting their limitations in quality, volume, and speed, and emphasizes the need for a standardized, continuous labeling approach to improve network security research.

Contribution

It provides an in-depth evaluation of existing network traffic labeling methods, identifying fundamental drawbacks and advocating for a consistent, validated labeling methodology.

Findings

01

Current labeling methods are often outdated and inconsistent.

02

Synthetic data generation hides key aspects of real network behavior.

03

Manual labeling with non-experts faces quality and scalability issues.

Abstract

In contrast to previous surveys, the present work is not focused on reviewing the datasets used in the network security field. The fact is that many of the available public labeled datasets represent the network behavior just for a particular time period. Given the rate of change in malicious behavior and the serious challenge to label, and maintain these datasets, they become quickly obsolete. Therefore, this work is focused on the analysis of current labeling methodologies applied to network-based data. In the field of network security, the process of labeling a representative network traffic dataset is particularly challenging and costly since very specialized knowledge is required to classify network traces. Consequently, most of the current traffic labeling methods are based on the automatic generation of synthetic network traces, which hides many of the essential aspects necessary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Anomaly Detection Techniques and Applications