Improving DNS Exfiltration Detection via Transformer Pretraining
Milo\v{s} Tomi\'c, Aleksa Cvetanovi\'c, Predrag Tadi\'c

TL;DR
This paper investigates how in-domain pretraining of Transformer models enhances subdomain-level detection of DNS exfiltration, especially at low false positive rates, through controlled experiments and ablations.
Contribution
It provides a controlled pipeline to isolate the effect of pretraining on DNS exfiltration detection and demonstrates significant improvements in detection performance.
Findings
Pretraining improves detection in the ROC curve's left tail.
More pretraining steps benefit when more labeled data is available.
Pretrained models outperform randomly initialized baselines.
Abstract
We study whether in-domain pretraining of Bidirectional Encoder Representations from Transformer (BERT) model improves subdomain-level detection of exfiltration at low false positive rates. While previous work mostly examines fine-tuned generic Transformers, it does not aim to isolate the effect of pretraining on the downstream task of classification. To address this gap, we develop a controlled pipeline where we freeze operating points on validation and transfer them to the test set, thus enabling clean ablations across different label and pretraining budgets. Our results show significant improvements in the left tail of the Receiver Operating Characteristic (ROC) curve, especially against randomly initialized baseline. Additionally, within pretrained model variants, increasing the number of pretraining steps helps the most when more labeled data are available for fine-tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
