Language of Network: A Generative Pre-trained Model for Encrypted Traffic Comprehension
Di Zhao, Bo Jiang, Song Liu, Susu Cui, Meng Shen, Dongqi Han, Xingmao Guan, Zhigang Lu

TL;DR
This paper introduces GBC, a pre-trained generative model for encrypted traffic analysis that improves classification and detection accuracy by leveraging protocol-aware tokenization and extensive unlabeled data.
Contribution
The paper presents a novel pre-trained generative model with protocol-aware tokenization for encrypted traffic, enhancing classification and detection over existing methods.
Findings
GBC achieves a 5% higher F1 score in classification tasks.
GBC effectively generates realistic encrypted traffic data.
Pretraining improves model robustness to attack evolution.
Abstract
The increasing demand for privacy protection and security considerations leads to a significant rise in the proportion of encrypted network traffic. Since traffic content becomes unrecognizable after encryption, accurate analysis is challenging, making it difficult to classify applications and detect attacks. Deep learning is currently the predominant approach for encrypted traffic classification through feature analysis. However, these methods face limitations due to their high dependence on labeled data and difficulties in detecting attack variants. First, their performance is highly sensitive to data quality, where the highcost manual labeling process and dataset imbalance significantly degrade results. Second, the rapid evolution of attack patterns makes it challenging for models to identify new types of attacks. To tackle these challenges, we present GBC, a generative model based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Advanced Malware Detection Techniques
