Wireless Dataset Similarity: Measuring Distances in Supervised and Unsupervised Machine Learning
Jo\~ao Morais, Sadjad Alikhani, Akshay Malhotra, Shahab Hamidi-Rad, Ahmed Alkhateeb

TL;DR
This paper proposes a framework for measuring wireless dataset similarity tailored to specific tasks and models, enabling better dataset selection, transferability prediction, and synthetic data generation in wireless machine learning applications.
Contribution
It introduces a novel, task- and model-aware dataset distance framework that improves prediction of transferability and outperforms traditional metrics in wireless data scenarios.
Findings
High correlation (exceeding 0.85) between dataset distances and model transfer performance.
Task-specific distance metrics outperform traditional baselines.
Framework applicable to both supervised and unsupervised wireless learning tasks.
Abstract
This paper introduces a task- and model-aware framework for measuring similarity between wireless datasets, enabling applications such as dataset selection/augmentation, simulation-to-real (sim2real) comparison, task-specific synthetic data generation, and informing decisions on model training/adaptation to new deployments. We evaluate candidate dataset distance metrics by how well they predict cross-dataset transferability: if two datasets have a small distance, a model trained on one should perform well on the other. We apply the framework on an unsupervised task, channel state information (CSI) compression, using autoencoders. Using metrics based on UMAP embeddings, combined with Wasserstein and Euclidean distances, we achieve Pearson correlations exceeding 0.85 between dataset distances and train-on-one/test-on-another task performance. We also apply the framework to a supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Signal Modulation Classification · Indoor and Outdoor Localization Technologies · Millimeter-Wave Propagation and Modeling
