CharBot: A Simple and Effective Method for Evading DGA Classifiers

Jonathan Peck; Claire Nie; Raaghavi Sivaguru; Charles Grumer; Femi; Olumofin; Bin Yu; Anderson Nascimento; Martine De Cock

arXiv:1905.01078·cs.LG·May 31, 2019

CharBot: A Simple and Effective Method for Evading DGA Classifiers

Jonathan Peck, Claire Nie, Raaghavi Sivaguru, Charles Grumer, Femi, Olumofin, Bin Yu, Anderson Nascimento, Martine De Cock

PDF

TL;DR

CharBot is a straightforward and efficient black-box adversarial attack that generates unregistered domain names capable of evading state-of-the-art machine learning-based DGA classifiers, exposing their vulnerability.

Contribution

We introduce CharBot, a novel simple DGA that effectively evades current classifiers without prior knowledge, highlighting the need for more robust detection methods.

Findings

01

CharBot successfully evades classifiers like FANCI and LSTM.MI.

02

Retraining classifiers on CharBot samples is ineffective.

03

DGA classifiers relying solely on domain strings are inherently vulnerable.

Abstract

Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently…

Figures2

Click any figure to enlarge with its caption.

Tables6

Table 1. TABLE I: Adversarial data sets

DGA	Data Set	Seeds Used	# Unique Synthetic Domains	# Unregistered Domains (out of 500 sampled)
CharBot	Training	2018-12-04	100,000	500 (100%)
CharBot	Testing	2019-01-01	10,000	500 (100%)
DeepDGA	Training	2018-12-04	100,000	499 (99.8%)
DeepDGA	Testing	2019-01-01	10,000	499 (99.8%)
DeceptionDGA	Training	N/A	100,000	494 (98.8%)
DeceptionDGA	Testing	N/A	10,000	494 (98.8%)

Table 2. TABLE II: Features used by FANCI and B-RF. (*) For these features, FANCI uses dot free public-suffix-free domain. (**) For these features, FANCI uses public-suffix-free domain.

#	Feature	FANCI	B-RF
1	Domain name length	✓	✓
2	Second level domain length	✗	✓
3	Top level domain length	✗	✓
4	Domain Unique Characters length	✗	✓
5	SLD Unique Characters length	✗	✓
6	TLD Unique Characters length	✗	✓
7	Has malicious TLD	✗	✓
8	Has Valid TLD	✓	✗
9	TLD Hash	✗	✓
10	Contains Digits	✓	✗
11	Starts with Digit	✗	✓
12	Underscore Ratio*	✓	✗
13	Symbol ratio	✗	✓
14	Hex ratio	✗	✓
15	Digit Ratio*	✓	✓
16	Vowel Ratio*	✓	✓
17	Consonant Ratio	✗	✓
18	Ratio of Repeated Characters*	✓	✓
19	Ratio of Consecutive Consonants*	✓	✓
20	Ratio of Consecutive Digits*	✓	✓
21	Number of tokens in SLD	✗	✓
22	Number of digits in SLD	✗	✓
23	Entropy*	✓	✓
24	Gini Index	✗	✓
25	Classification error of characters	✗	✓
26	N-Gram Distribution*	✓	✗
27	2-Gram Median	✗	✓
28	3-Gram Median	✗	✓
29	2-Gram Circle Median	✗	✓
30	3-Gram Circle Median	✗	✓
31	Number of Subdomains**	✓	✗
32	Subdomain Length Mean**	✓	✗
33	Has www Prefix	✓	✗
34	Contains Single-Character Subdomain**	✓	✗
35	Is Exclusive Prefix Repetition	✓	✗
36	Contains TLD as Subdomain**	✓	✗
37	Ratio of Digit-Exclusive Subdomains**	✓	✗
38	Ratio of Hexadecimal-Exclusive Subdomains**	✓	✗
39	Contains IP Address**	✓	✗
40	Alphabet Cardinality*	✓	✗

Table 3. TABLE III: FANCI features not expected to have any effect in our experiments

#	Feature
31	Number of Subdomains
33	Has www Prefix
34	Contains Single-Character Subdomain
35	Is Exclusive Prefix Repetition
36	Contains TLD as Subdomain

Table 4. TABLE IV: Performance metrics of LSTM.MI, FANCI and B-RF on the different data sets.

Classifier	Data set	FPR=0.001		FPR=0.01
Classifier	Data set	TPR	AUC	TPR	AUC
LSTM.MI	AlexaBamb	96.79%	94.91%	99.27%	98.89%
	AlexaBamb + CharBot	95.50%	95.35%	98.89%	98.67%
	AlexaBamb + DeepDGA	96.65%	96.44%	99.20%	99.00%
	AlexaBamb + DeceptionDGA	95.17%	95.05%	98.54%	98.46%
	QnameBamb	81.98%	83.37%	98.98%	96.68%
	QnameBamb + CharBot	82.91%	84.98%	98.51%	96.48%
	QnameBamb + DeepDGA	83.22%	83.98%	98.85%	96.50%
	QnameBamb + DeceptionDGA	84.66%	85.57%	98.61%	96.82%
FANCI	AlexaBamb	—	—	74.46%	80.46%
	AlexaBamb + CharBot	—	—	72.49%	78.71%
	AlexaBamb + DeepDGA	—	—	73.98%	80.02%
	AlexaBamb + DeceptionDGA	—	—	73.84%	80.18%
	QnameBamb	—	—	74.13%	79.06%
	QnameBamb + CharBot	—	—	72.13%	77.80%
	QnameBamb + DeepDGA	—	—	72.89%	78.19%
	QnameBamb + DeceptionDGA	—	—	73.65%	78.86%
B-RF	AlexaBamb	85.72%	82.93%	94.72%	94.67%
	AlexaBamb + CharBot	84.62%	79.30%	93.89%	93.75%
	AlexaBamb + DeepDGA	85.81%	80.94%	94.39%	94.35%
	AlexaBamb + DeceptionDGA	84.51%	78.62%	93.78%	93.73%
	QnameBamb	82.75%	73.85%	96.88%	94.52%
	QnameBamb + CharBot	83.03%	74.98%	96.37%	94.29%
	QnameBamb + DeepDGA	82.64%	73.87%	96.39%	94.26%
	QnameBamb + DeceptionDGA	82.98%	76.26%	96.26%	94.56%

Table 5. TABLE V: Detection rates of the different DGAs.

Classifier	Data set	FPR=0.001			FPR=0.01
Classifier	Data set	CharBot	DeepDGA	DeceptionDGA	CharBot	DeepDGA	DeceptionDGA
LSTM.MI	AlexaBamb	5.58%	33.98%	4.02%	15.50%	39.53%	12.74%
	AlexaBamb + CharBot	55.19%	92.54%	19.69%	81.08%	98.44%	47.67%
	AlexaBamb + DeepDGA	12.39%	98.35%	7.34%	12.39%	98.35%	7.34%
	AlexaBamb + DeceptionDGA	23.59%	88.71%	40.29%	52.18%	96.66%	71.52%
	QnameBamb	15.25%	6.18%	16.61%	31.90%	19.51%	37.73%
	QnameBamb + CharBot	52.67%	42.48%	34.45%	81.96%	85.90%	66.27%
	QnameBamb + DeepDGA	27.84%	94.51%	24.25%	43.28%	99.61%	47.33%
	QnameBamb + DeceptionDGA	30.45%	15.97%	37.74%	53.31%	37.97%	24.25%
FANCI	AlexaBamb	—	—	—	3.05%	6.33%	1.66%
	AlexaBamb + CharBot	—	—	—	22.26%	12.12%	2.64%
	AlexaBamb + DeepDGA	—	—	—	6.45%	83.17%	2.08%
	AlexaBamb + DeceptionDGA	—	—	—	4.27%	6.57%	2.33%
	QnameBamb	—	—	—	21.43%	5.37%	46.85%
	QnameBamb + CharBot	—	—	—	48.44%	14.20%	49.62%
	QnameBamb + DeepDGA	—	—	—	45.13%	77.88%	50.11%
	QnameBamb + DeceptionDGA	—	—	—	44.75%	13.77%	50.45%
B-RF	AlexaBamb	1.69%	6.61%	1.38%	27.59%	23.97%	22.37%
	AlexaBamb + CharBot	1.84%	9.19%	1.20%	64.33%	34.06%	41.22%
	AlexaBamb + DeepDGA	4.54%	46.55%	2.86%	31.80%	84.12%	26.86%
	AlexaBamb + DeceptionDGA	2.00%	8.23%	1.34%	33.14%	24.51%	32.53%
	QnameBamb	18.80%	2.99%	41.67%	61.05%	22.57%	62.31%
	QnameBamb + CharBot	43.47%	12.54%	49.18%	85.82%	60.70%	79.33%
	QnameBamb + DeepDGA	44.04%	33.08%	49.80%	65.97%	98.84%	67.79%
	QnameBamb + DeceptionDGA	39.97%	10.53%	47.86%	62.29%	23.30%	67.74%

Table 6. TABLE VI: Statistics of the domain name lengths for each data set.

Data Set	Mean	Standard Deviation
Alexa	14.30	4.70
Bambenek	21.80	6.42
Qname	23.99	7.97
CharBot	14.27	4.02
DeepDGA	28.52	9.06
DeceptionDGA	10.20	3.90

Equations4

\displaystyle c(x,\tilde{x})=\begin{cases}d_{L}(x,\tilde{x})&\mbox{if $\tilde{x}$ is unregistered,}\\ \infty&\mbox{otherwise.}\end{cases}

\displaystyle c(x,\tilde{x})=\begin{cases}d_{L}(x,\tilde{x})&\mbox{if $\tilde{x}$ is unregistered,}\\ \infty&\mbox{otherwise.}\end{cases}

n (k ℓ) (m - 1)^{k} .

n (k ℓ) (m - 1)^{k} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory

Full text

*[inlinelist,1]label=(0),

CharBot: A Simple and Effective Method for Evading DGA Classifiers

Jonathan Peck

Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, 9000, Belgium

Data Mining and Modeling for Biomedicine, VIB Inflammation Research Center, Ghent, 9052, Belgium

Claire Nie