# HASwinNet: A Swin Transformer-Based Denoising Framework with Hybrid Attention for mmWave MIMO Systems

**Authors:** Xi Han, Houya Tu, Jiaxi Ying, Junqiao Chen, Zhiqiang Xing

PMC · DOI: 10.3390/e28010124 · 2026-01-20

## TL;DR

HASwinNet is a deep learning framework that improves channel estimation in mmWave MIMO systems for 6G networks by using a Swin Transformer and hybrid attention techniques.

## Contribution

Introduces HASwinNet, a novel Swin Transformer-based denoising framework with hybrid attention for mmWave MIMO systems.

## Key findings

- HASwinNet achieves significant improvements in NMSE and BER compared to CNN, LSTM, and U-Net baselines.
- The model effectively exploits angular sparsity and maintains performance under pilot-limited conditions.
- Results validate HASwinNet's scalability for 6G mmWave backhaul and ISAC applications.

## Abstract

Millimeter-wave (mmWave) massive multiple-input, multiple-output (MIMO) systems are a cornerstone technology for integrated sensing and communication (ISAC) in sixth-generation (6G) mobile networks. These systems provide high-capacity backhaul while simultaneously enabling high-resolution environmental sensing. However, accurate channel estimation remains highly challenging due to intrinsic noise sensitivity and clustered sparse multipath structures. These challenges are particularly severe under limited pilot resources and low signal-to-noise ratio (SNR) conditions. To address these difficulties, this paper proposes HASwinNet, a deep learning (DL) framework designed for mmWave channel denoising. The framework integrates a hierarchical Swin Transformer encoder for structured representation learning. It further incorporates two complementary branches. The first branch performs sparse token extraction guided by angular-domain significance. The second branch focuses on angular-domain refinement by applying discrete Fourier transform (DFT), squeeze-and-excitation (SE), and inverse DFT (IDFT) operations. This generates a mask that highlights angularly coherent features. A decoder combines the outputs of both branches with a residual projection from the input to yield refined channel estimates. Additionally, we introduce an angular-domain perceptual loss during training. This enforces spectral consistency and preserves clustered multipath structures. Simulation results based on the Saleh–Valenzuela (S–V) channel model demonstrate that HASwinNet achieves significant improvements in normalized mean squared error (NMSE) and bit error rate (BER). It consistently outperforms convolutional neural network (CNN), long short-term memory (LSTM), and U-Net baselines. Furthermore, experiments with reduced pilot symbols confirm that HASwinNet effectively exploits angular sparsity. The model retains a consistent advantage over baselines even under pilot-limited conditions. These findings validate the scalability of HASwinNet for practical 6G mmWave backhaul applications. They also highlight its potential in ISAC scenarios where accurate channel recovery supports both communication and sensing.

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12840458/full.md

---
Source: https://tomesphere.com/paper/PMC12840458