TL;DR
EMind is a large-scale foundation model for electromagnetic signals that leverages physical properties and a unified dataset to enable cross-task generalization and efficient multi-source signal understanding.
Contribution
This work introduces EMind, the first unified electromagnetic signals foundation model, with a large standardized dataset and novel training strategies for multi-task learning.
Findings
EMind achieves strong performance across multiple downstream tasks.
The model demonstrates broad generalization capabilities.
Efficient learning from heterogeneous multi-source signals is enabled.
Abstract
Deep understanding of electromagnetic signals is fundamental to dynamic spectrum management, intelligent transportation, autonomous driving and unmanned vehicle perception. The field faces challenges because electromagnetic signals differ greatly from text and images, showing high heterogeneity, strong background noise and complex joint time frequency structure, which prevents existing general models from direct use. Electromagnetic communication and sensing tasks are diverse, current methods lack cross task generalization and transfer efficiency, and the scarcity of large high quality datasets blocks the creation of a truly general multitask learning framework. To overcome these issue, we introduce EMind, an electromagnetic signals foundation model that bridges large scale pretraining and the unique nature of this modality. We build the first unified and largest standardized…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
(1) The introduction of EMBench is a valuable contribution to the EM perception community. It consolidates datasets from multiple subdomains such as materials, gestures, imaging, and localization, creating a standardized testbed for multi-task learning and facilitating fair comparisons among future EM-based models. (2) The proposed architecture effectively supports cross-task transfer through its task-specific attention mechanism, allowing shared backbone features to adapt to diverse sensing ob
(1) Many of the datasets used in the study are relatively small, proprietary, or limited to specific sensors such as radar-based gesture datasets. This raises concerns about the generalizability of the model across broader EM modalities, including mmWave, MRI, and THz imaging, which exhibit significantly different propagation characteristics and data properties. (2) The assumption that EM signals from various modalities can be uniformly transformed into 2D embeddings may be restrictive. Differe
- The EMdata81M is highly valuable. The curation of the data sources and the careful consideration of task types and source types is invaluable for further research in the direction of ML for electromagnet waves.
- The methodological contribution of the paper is limited. Both pertaining methods for the EMind foundation model are exhaustively used in (pre-)training strategies or learning schedules. First, the length-adaptive multi-signal packing is somewhat trivial. as I understand, smaller packets are embedded sequentially into a sequence, divided by cls and sampling rate tokens. As the foundation model applies masking for self-supervised pre-training, masking needs to be adaptive to the respective sampl
- The authors are providing a valuable dataset for the community to train new models for EM signals. - The dataset is large in scale and has great diversity. - Task types evaluated are comprehensive.
- **Lack of clarification and verification on the hardware-aware module proposed** The hardware-aware adjustable dataset weighting is vaguely described and lacks quantitative or ablation results. It is unclear what “hardware-aware” means, whether it relates to GPU memory, CPU buffering, or real-time scheduling, and no analysis is provided to confirm its effect on convergence or generalization. - **Limited novelty** The model architecture follows a standard masked autoenco
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
