# Enhancing Automatic Modulation Recognition With a Reconstruction-Driven Vision Transformer Under Limited Labels

**Authors:** Hossein Ahmadi, Banafsheh Saffari, Sajjad Emdadi Mahdimahalleh, Mohammad Esmaeil Safari, Aria Ahmadi

arXiv: 2508.20193 · 2025-09-12

## TL;DR

This paper introduces a Vision Transformer-based framework for automatic modulation recognition that effectively learns from limited labeled data by combining supervised, self-supervised, and reconstruction tasks, improving accuracy and robustness.

## Contribution

It presents a novel unified ViT framework with a reconstruction branch that enhances feature learning and enables effective AMR with limited labels.

## Key findings

- Outperforms CNN and ViT baselines in low-label scenarios
- Achieves near-ResNet accuracy with only 15-20% labeled data
- Maintains strong performance across different SNR levels

## Abstract

Automatic modulation recognition (AMR) is critical for cognitive radio, spectrum monitoring, and secure wireless communication. However, existing solutions often rely on large labeled datasets or multi-stage training pipelines, which limit scalability and generalization in practice. We propose a unified Vision Transformer (ViT) framework that integrates supervised, self-supervised, and reconstruction objectives. The model combines a ViT encoder, a lightweight convolutional decoder, and a linear classifier; the reconstruction branch maps augmented signals back to their originals, anchoring the encoder to fine-grained I/Q structure. This strategy promotes robust, discriminative feature learning during pretraining, while partial label supervision in fine-tuning enables effective classification with limited labels. On the RML2018.01A dataset, our approach outperforms supervised CNN and ViT baselines in low-label regimes, approaches ResNet-level accuracy with only 15-20% labeled data, and maintains strong performance across varying SNR levels. Overall, the framework provides a simple, generalizable, and label-efficient solution for AMR.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20193/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20193/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/2508.20193/full.md

---
Source: https://tomesphere.com/paper/2508.20193