DNS-GT: A Graph-based Transformer Approach to Learn Embeddings of Domain Names from DNS Queries
Massimiliano Altieri, Ronan Hamon, Roberto Corizzo, Michelangelo Ceci, Ignacio Sanchez

TL;DR
DNS-GT is a novel Transformer-based approach that learns contextual embeddings of domain names from DNS query sequences, improving intrusion detection and domain classification tasks.
Contribution
It introduces a self-supervised Transformer model for DNS data that captures query context, enhancing representation learning for security applications.
Findings
Outperforms baseline methods in domain classification.
Achieves higher accuracy in botnet detection.
Demonstrates effective learning of DNS query behavior.
Abstract
Network intrusion detection systems play a crucial role in the security strategy employed by organisations to detect and prevent cyberattacks. Such systems usually combine pattern detection signatures with anomaly detection techniques powered by machine learning methods. However, the commonly proposed machine learning methods present drawbacks such as over-reliance on labeled data and limited generalization capabilities. To address these issues, embedding-based methods have been introduced to learn representations from network data, such as DNS traffic, mainly due to its large availability, that generalise effectively to many downstream tasks. However, current approaches do not properly consider contextual information among DNS queries. In this paper, we tackle this issue by proposing DNS-GT, a novel Transformer-based model that learns embeddings for domain names from sequences of DNS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Spam and Phishing Detection · Internet Traffic Analysis and Secure E-voting
