Command & Control (C2) Traffic Detection Via Algorithm Generated Domain (Dga) Classification Using Deep Learning And Natural Language Processing
Maria Milena Araujo Felix

TL;DR
This paper presents a deep learning and NLP-based method for detecting DGA domains used in C2 malware communication, achieving high accuracy and reducing false positives compared to traditional entropy analysis.
Contribution
It introduces a hybrid dataset and demonstrates that LSTM neural networks outperform statistical entropy in identifying complex DGA domains.
Findings
Achieved 97.2% detection accuracy.
LSTM outperforms entropy analysis for complex DGAs.
Reduces false positives in ambiguous traffic.
Abstract
The sophistication of modern malware, specifically regarding communication with Command and Control (C2) servers, has rendered static blacklist-based defenses obsolete. The use of Domain Generation Algorithms (DGA) allows attackers to generate thousands of dynamic addresses daily, hindering blocking by traditional firewalls. This paper aims to propose and evaluate a method for detecting DGA domains using Deep Learning and Natural Language Processing (NLP) techniques. The methodology consisted of collecting a hybrid database containing 50,000 legitimate and 50,000 malicious domains, followed by the extraction of lexical features and the training of a Recurrent Neural Network (LSTM). Results demonstrated that while statistical entropy analysis is effective for simple DGAs, the Neural Network approach presents superiority in detecting complex patterns, reaching 97.2% accuracy and reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Network Packet Processing and Optimization · Internet Traffic Analysis and Secure E-voting
