Refining Network Message Segmentation with Principal Component Analysis
Stephan Kleber, Frank Kargl

TL;DR
This paper introduces a PCA-based method to improve message segmentation accuracy in protocol reverse engineering, significantly enhancing field inference in network traffic analysis.
Contribution
It presents a novel application of PCA to refine message segmentation boundaries, improving the accuracy of protocol message format inference.
Findings
Median improvement of message format accuracy up to 100%
Effective in real-world protocol analysis
Enhances subsequent message analysis tasks
Abstract
Reverse engineering of undocumented protocols is a common task in security analyses of networked services. The communication itself, captured in traffic traces, contains much of the necessary information to perform such a protocol reverse engineering. The comprehension of the format of unknown messages is of particular interest for binary protocols that are not human-readable. One major challenge is to discover probable fields in a message as the basis for further analyses. Given a set of messages, split into segments of bytes by an existing segmenter, we propose a method to refine the approximation of the field inference. We use principle component analysis (PCA) to discover linearly correlated variance between sets of message segments. We relocate the boundaries of the initial coarse segmentation to more accurately match with the true fields. We perform different evaluations of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Advanced Malware Detection Techniques
