Time Series Deinterleaving of DNS Traffic
Amir Asiaee, Hardik Goel, Shalini Ghosh, Vinod Yegneswaran, Arindam, Banerjee

TL;DR
This paper presents a machine learning approach to deinterleave DNS traffic streams, specifically aiming to automate malware domain sequence extraction, and finds LSTMs outperform HMMs in synthetic tests.
Contribution
It introduces a generative model for DNS stream interleaving and evaluates inference strategies, demonstrating the effectiveness of LSTMs over HMMs for this task.
Findings
LSTMs outperform augmented HMMs in deinterleaving accuracy
A generative model for DNS stream interleaving is developed
Evaluation conducted on synthetic datasets
Abstract
Stream deinterleaving is an important problem with various applications in the cybersecurity domain. In this paper, we consider the specific problem of deinterleaving DNS data streams using machine-learning techniques, with the objective of automating the extraction of malware domain sequences. We first develop a generative model for user request generation and DNS stream interleaving. Based on these we evaluate various inference strategies for deinterleaving including augmented HMMs and LSTMs on synthetic datasets. Our results demonstrate that state-of-the-art LSTMs outperform more traditional augmented HMMs in this application domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Speech Recognition and Synthesis
