Time Series Deinterleaving of DNS Traffic

Amir Asiaee; Hardik Goel; Shalini Ghosh; Vinod Yegneswaran; Arindam; Banerjee

arXiv:1807.05650·cs.LG·July 17, 2018·1 cites

Time Series Deinterleaving of DNS Traffic

Amir Asiaee, Hardik Goel, Shalini Ghosh, Vinod Yegneswaran, Arindam, Banerjee

PDF

Open Access

TL;DR

This paper presents a machine learning approach to deinterleave DNS traffic streams, specifically aiming to automate malware domain sequence extraction, and finds LSTMs outperform HMMs in synthetic tests.

Contribution

It introduces a generative model for DNS stream interleaving and evaluates inference strategies, demonstrating the effectiveness of LSTMs over HMMs for this task.

Findings

01

LSTMs outperform augmented HMMs in deinterleaving accuracy

02

A generative model for DNS stream interleaving is developed

03

Evaluation conducted on synthetic datasets

Abstract

Stream deinterleaving is an important problem with various applications in the cybersecurity domain. In this paper, we consider the specific problem of deinterleaving DNS data streams using machine-learning techniques, with the objective of automating the extraction of malware domain sequences. We first develop a generative model for user request generation and DNS stream interleaving. Based on these we evaluate various inference strategies for deinterleaving including augmented HMMs and LSTMs on synthetic datasets. Our results demonstrate that state-of-the-art LSTMs outperform more traditional augmented HMMs in this application domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Speech Recognition and Synthesis