Simplicity is Key: An Unsupervised Pretraining Approach for Sparse Radio Channels
Jonathan Ott, Maximilian Stahlke, Tobias Feigl, Bjoern M. Eskofier, Christopher Mutschler

TL;DR
This paper introduces SpaRTran, an unsupervised pretraining method for wireless channels that incorporates domain knowledge through a compressed sensing model, significantly improving radio localization and beamforming accuracy.
Contribution
SpaRTran is the first hybrid unsupervised learning approach that integrates physical wireless channel models with transformer architectures, enhancing performance over existing methods.
Findings
Reduces positioning error by up to 28%
Increases beamforming codebook accuracy by 26 percentage points
Improves performance on network optimization and radio-localization tasks
Abstract
Unsupervised representation learning for wireless channel state information (CSI)reduces reliance on labeled data, thereby lowering annotation costs, and often improves performance on downstream tasks. However, state-of-the-art approaches take little or no account of domain-specific knowledge, forcing the model to learn well-known concepts solely from data. We introduce Sparse pretrained Radio Transformer (SpaRTran), a hybrid method based on the concept of compressed sensing for wireless channels. In contrast to existing work, SpaRTran builds around a wireless channel model that constrains the optimization procedure to physically meaningful solutions and induces a strong inductive bias. Compared to the state of the art, SpaRTran cuts positioning error by up to 28% and increases top-1 codebook selection accuracy for beamforming by 26 percentage points. Our results show that capturing the…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The proposed method builds on a well-established sparse multipath channel model and applies a compressed sensing-based method. The formulation is well motivated and captures the nature of wireless channels. This makes the architecture design of the learning model more interpretable. 2. The gain over the existing methods is illustrated with empirical experiments over two different tasks and several datasets.
1. The novelty is obviously overclaimed; applying compressed sensing to the design of unsupervised pretraining is not new. A brief search over the existing literature I found several works that have explored this idea. e.g. Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization, AISTATS 2019. What is novel here is the compressed sensing-inspired unsupervised pretraining method for CSI representation in wireless channels. 2. The theory section
One strength of the solution is that it is based on a system-agnostic channel assumption that enables its use across different communication systems. The theoretical analysis in the paper is strong and motivates the sparsity assumption of time domain wireless channels. Compared to baselines, SpaRTran achieves superior performance in both downstream tasks across different datasets.
- This work demonstrates limited novelty, as the authors simply extend the gated SAE to handle complex-valued inputs and replace the ReLU activation with LeakyReLU. A major limitation of the work is the encoder's awareness of spatial characteristics. The authors try to circumvent that problem by learning those spatial characteristics separately for each task from scratch by using a ResNet with 1D convolutions added on top of the pretrained encoder. However, this requires processing of every pair
- Modeling channels as sparse sums of path atoms, then learning a dictionary and complex phases, is well-matched to multipath propagation and potentially avoids mismatches seen in generic SSL. - Clear architectural choices. The gate–magnitude decoupling (plus auxiliary reconstruction through the gate) is thoughtful. - System-agnostic pretraining. Training on single-channel measurements (not full CSI) makes the foundation model less tied to a specific antenna configuration and reduces labeling/
- I don't fully understand the motivation for using a sparse representation. If you can learn an embedding and a dictionary, then can't you get small embeddings (as opposed to sparse but large embeddings) which can make fine-tuning for beamforming more practical? - I don't understand the purpose of Theorem 1 & 2. Specifically, if your main contributions are the gated sparse-autoencoder, then the operators constructed by hand in the Theorems have no connection to your actual algorithm. - The a
Results looks good.
Even I have some background in physical layer wireless processing and machine learning, I have some hard time to understand that what is happening the paper. I can understand that the motivation is to find a sparse representations for the channel in terms of a directionally but, two major questions which remains after the manuscript are why I would do that and how I would apply the approach in practice? The paper seems to be a mix of results from previous researches with some specific terminolog
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndoor and Outdoor Localization Technologies · Wireless Signal Modulation Classification · Sparse and Compressive Sensing Techniques
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Layer Normalization · Focus · Byte Pair Encoding · Label Smoothing · Adam · Softmax
