IoT Device Identification with Machine Learning: Common Pitfalls and Best Practices

Kahraman Kostas; Rabia Yasa Kostas

arXiv:2601.20548·cs.CR·January 29, 2026

IoT Device Identification with Machine Learning: Common Pitfalls and Best Practices

Kahraman Kostas, Rabia Yasa Kostas

PDF

Open Access

TL;DR

This paper reviews machine learning-based IoT device identification, highlighting common pitfalls and offering best practices to improve model robustness, reproducibility, and generalizability in IoT security applications.

Contribution

It provides a critical analysis of existing methods, identifies key errors, and offers guidelines to address challenges in IoT device identification using machine learning.

Findings

01

Identifies improper data augmentation as a common error.

02

Highlights issues with misleading session identifiers.

03

Provides best practices for reproducibility and generalization.

Abstract

This paper critically examines the device identification process using machine learning, addressing common pitfalls in existing literature. We analyze the trade-offs between identification methods (unique vs. class based), data heterogeneity, feature extraction challenges, and evaluation metrics. By highlighting specific errors, such as improper data augmentation and misleading session identifiers, we provide a robust guideline for researchers to enhance the reproducibility and generalizability of IoT security models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInternet Traffic Analysis and Secure E-voting · Advanced Malware Detection Techniques · User Authentication and Security Systems