Replication: Contrastive Learning and Data Augmentation in Traffic   Classification Using a Flowpic Input Representation

Alessandro Finamore; Chao Wang; Jonatan Krolikowski; Jose M. Navarro,; Fuxing Chen; Dario Rossi

arXiv:2309.09733·cs.LG·October 17, 2023

Replication: Contrastive Learning and Data Augmentation in Traffic Classification Using a Flowpic Input Representation

Alessandro Finamore, Chao Wang, Jonatan Krolikowski, Jose M. Navarro,, Fuxing Chen, Dario Rossi

PDF

1 Repo

TL;DR

This paper reproduces and extends recent deep learning methods for traffic classification using flowpic representations, confirming their effectiveness and analyzing data augmentation's role across multiple datasets, while uncovering dataset shifts.

Contribution

It reproduces key results of a recent study on traffic classification and validates data augmentation strategies on additional datasets, highlighting dataset shifts affecting accuracy.

Findings

01

Data augmentation improves classification accuracy across datasets.

02

A 20% accuracy drop was observed due to dataset shifts.

03

Reproducibility artifacts are made publicly available.

Abstract

Over the last years we witnessed a renewed interest toward Traffic Classification (TC) captivated by the rise of Deep Learning (DL). Yet, the vast majority of TC literature lacks code artifacts, performance assessments across datasets and reference comparisons against Machine Learning (ML) methods. Among those works, a recent study from IMC22 [16] is worth of attention since it adopts recent DL methodologies (namely, few-shot learning, self-supervision via contrastive learning and data augmentation) appealing for networking as they enable to learn from a few samples and transfer across datasets. The main result of [16] on the UCDAVIS19, ISCX-VPN and ISCX-Tor datasets is that, with such DL methodologies, 100 input samples are enough to achieve very high accuracy using an input representation called "flowpic" (i.e., a per-flow 2d histograms of the packets size evolution over time). In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tcbenchstack/tcbench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning