Understanding IoT Domain Names: Analysis and Classification Using Machine Learning
Ibrahim Ayoub, Martine S. Lenders, Beno\^it Ampeau, Sandoche, Balakrichenan, Kinda Khawam, Thomas C. Schmidt, Matthias W\"ahlisch

TL;DR
This paper analyzes IoT server domain names using machine learning, particularly word embeddings and Random Forest, to classify IoT-related domains from others, providing insights for security and protocol design.
Contribution
It introduces a machine learning approach with word embeddings for classifying IoT server domains, achieving high accuracy and offering new insights for IoT network management.
Findings
Random Forest achieved highest classification performance.
Word2vec effectively represented domain names for ML models.
The study provides insights into IoT domain name characteristics.
Abstract
In this paper, we investigate the domain names of servers on the Internet that are accessed by IoT devices performing machine-to-machine communications. Using machine learning, we classify between them and domain names of servers contacted by other types of devices. By surveying past studies that used testbeds with real-world devices and using lists of top visited websites, we construct lists of domain names of both types of servers. We study the statistical properties of the domain name lists and train six machine learning models to perform the classification. The word embedding technique we use to get the real-value representation of the domain names is Word2vec. Among the models we train, Random Forest achieves the highest performance in classifying the domain names, yielding the highest accuracy, precision, recall, and F1 score. Our work offers novel insights to IoT, potentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
