On the Role of ViT and CNN in Semantic Communications: Analysis and Prototype Validation

Hanju Yoo; Linglong Dai; Songkuk Kim; Chan-Byoung Chae

arXiv:2306.02759·eess.SP·April 29, 2026·2 cites

On the Role of ViT and CNN in Semantic Communications: Analysis and Prototype Validation

Hanju Yoo, Linglong Dai, Songkuk Kim, Chan-Byoung Chae

PDF

1 Repo

TL;DR

This paper introduces a ViT-based model for semantic communications, demonstrating improved performance, novel analysis measures, and validating the approach with a real SDR prototype, supported by open-source code.

Contribution

It presents the first fundamental analysis of semantic communication systems using ViTs, along with a hardware prototype and open-source implementation.

Findings

01

ViT-based model achieves +0.5 dB PSNR over CNN variants.

02

Introduces cosine similarity and Fourier analysis for system insight.

03

Validates approach with a real wireless SDR prototype.

Abstract

Semantic communications have shown promising advancements by optimizing source and channel coding jointly. However, the dynamics of these systems remain understudied, limiting research and performance gains. Inspired by the robustness of Vision Transformers (ViTs) in handling image nuisances, we propose a ViT-based model for semantic communications. Our approach achieves a peak signal-to-noise ratio (PSNR) gain of +0.5 dB over convolutional neural network variants. We introduce novel measures, average cosine similarity and Fourier analysis, to analyze the inner workings of semantic communications and optimize the system's performance. We also validate our approach through a real wireless channel prototype using software-defined radio (SDR). To the best of our knowledge, this is the first investigation of the fundamental workings of a semantic communications system, accompanied by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.