Predicting Domain Generation Algorithms with Long Short-Term Memory Networks
Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, Daniel Grant

TL;DR
This paper introduces an LSTM-based classifier for detecting and identifying malware-generated domain names, significantly outperforming existing methods with high accuracy and low false positive rates.
Contribution
The paper presents a novel LSTM-based approach for DGA detection that does not require feature extraction, achieving superior accuracy over state-of-the-art techniques.
Findings
Achieves 0.9993 AUC in binary DGA detection.
Attains a micro-averaged F1 score of 0.9906.
Reduces false positives by twenty times compared to previous methods.
Abstract
Various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to a command and control (C&C) server. In order to block DGA C&C traffic, security organizations must first discover the algorithm by reverse engineering malware samples, then generating a list of domains for a given seed. The domains are then either preregistered or published in a DNS blacklist. This process is not only tedious, but can be readily circumvented by malware authors using a large number of seeds in algorithms with multivariate recurrence properties (e.g., banjori) or by using a dynamic list of seeds (e.g., bedep). Another technique to stop malware from using DGAs is to intercept DNS queries on a network and predict whether domains are DGA generated. Such a technique will alert network administrators to the presence of malware on their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Spam and Phishing Detection
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
