The Radio-Frequency Transformer for Signal Separation
Egor Lifar, Semyon Savkin, Rachana Madhukara, Tejas Jayashankar, Yury Polyanskiy, Gregory W. Wornell

TL;DR
This paper introduces a data-driven transformer-based method for RF signal separation that learns a discrete tokenizer and achieves significant improvements in bit-error rate, with potential applications beyond radio-frequency data.
Contribution
It presents a novel transformer-based signal separator with a learned tokenizer that outperforms traditional methods and generalizes to unseen interference types.
Findings
122x reduction in bit-error rate over prior methods
Zero-shot generalization to unseen mixtures
Effective on real and synthetic RF data
Abstract
We study a problem of signal separation: estimating a signal of interest (SOI) contaminated by an unknown non-Gaussian background/interference. Given the training data consisting of examples of SOI and interference, we show how to build a fully data-driven signal separator. To that end we learn a good discrete tokenizer for SOI and then train an end-to-end transformer on a cross-entropy loss. Training with a cross-entropy shows substantial improvements over the conventional mean-squared error (MSE). Our tokenizer is a modification of Google's SoundStream, which incorporates additional transformer layers and switches from VQVAE to finite-scalar quantization (FSQ). Across real and synthetic mixtures from the MIT RF Challenge dataset, our method achieves competitive performance, including a 122x reduction in bit-error rate (BER) over prior state-of-the-art techniques for separating a QPSK…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper tackles a practically relevant and challenging problem -- signal separation under RF interference. The use of cross-entropy loss on quantized token sequences, rather than waveform-level MSE, is technically appropriate for a discrete latent representation and improves compatibility with communication metrics such as BER. The writing and experimental presentation are clear and organized, helping reproducibility.
The model is a relatively straightforward adaptation of existing architectures (SoundStream tokenizer + transformer) to an RF dataset. Architectural, theoretical, or algorithmic innovations are limited. Evaluation is restricted to the MIT RF Challenge dataset. The paper provides no quantitative evidence on why each design choice (tokenization depth, transformer depth, etc.) matters, making the results difficult to interpret. Assertions of robustness to unseen interference are based on synthet
1. The paper steps outside the well-trodden domain of audio source separation and applies modern sequence-to-sequence modeling to the more constrained problem of RF signal separation. The domain is very intriguing and the success is measured by the unforgiving metric of Bit Error Rate (BER), not perceptual audio quality. By successfully adapting these advanced architectures to this field, the authors bridge a critical gap between mainstream deep learning and a specialized engineering domain with
The paper has some weaknesses and I will try to write them down with decreasing order of significance. 1. The authors should consider placing their contributions with proper citations of the previous methods in the field. The first known published method to perform separation of any signal in some latent continuous domain for neural networks was proposed in [D] and more similar to the contribution of this paper, the first method to formulate the separation problem to a classification-like probl
The method showcases very good empirical results on common benchmarks in the wireless communications domain, surpassing current state-of-the-art models of the latest ICASSP challenge on wireless signal source separation. I also appreciate the effort put into making the model real-time, since this is the typical real-world use-case. Finally, the experiments on additive Gaussian intereference and the ablations are well received, showcasing improvement over classic baselines such as matched filteri
My thought reading this paper is that it would rank low on novelty since variants of the proposed method (conditional generation via autoregressive transformers for signal processing) is becoming more ubiquitous, for example in audio source separation [1] and accompaniment generation [2]. Also [3] should be mentioned as being the first method to perform source separation in a quantised autoencoder domain via autoregressive transformers. Nevertheless in the application domain of wireless comunica
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPulsars and Gravitational Waves Research · Sparse and Compressive Sensing Techniques · Speech and Audio Processing
