# ProMSC-MIS: Prompt-based Multimodal Semantic Communication for Multi-Spectral Image Segmentation

**Authors:** Haoshuo Zhang, Yufei Bo, Meixia Tao

arXiv: 2508.20057 · 2025-08-28

## TL;DR

ProMSC-MIS is a prompt-based multimodal semantic communication framework that significantly improves multi-spectral image segmentation efficiency by reducing bandwidth and computational costs through innovative pre-training and fusion strategies.

## Contribution

It introduces a novel prompt-based multimodal semantic communication framework with pre-training and fusion modules for efficient multi-spectral image segmentation.

## Key findings

- Reduces channel bandwidth by 50-70% at same performance
- Decreases storage overhead by 26% and computational complexity by 37%
- Outperforms conventional methods in segmentation accuracy

## Abstract

Multimodal semantic communication has great potential to enhance downstream task performance by integrating complementary information across modalities. This paper introduces ProMSC-MIS, a novel Prompt-based Multimodal Semantic Communication framework for Multi-Spectral Image Segmentation. It enables efficient task-oriented transmission of spatially aligned RGB and thermal images over band-limited channels. Our framework has two main design novelties. First, by leveraging prompt learning and contrastive learning, unimodal semantic encoders are pre-trained to learn diverse and complementary semantic representations by using features from one modality as prompts for another. Second, a semantic fusion module that combines cross-attention mechanism and squeeze-and-excitation (SE) networks is designed to effectively fuse cross-modal features. Experimental results demonstrate that ProMSC-MIS substantially outperforms conventional image transmission combined with state-of-the-art segmentation methods. Notably, it reduces the required channel bandwidth by 50%--70% at the same segmentation performance, while also decreasing the storage overhead and computational complexity by 26% and 37%, respectively. Ablation studies also validate the effectiveness of the proposed pre-training and semantic fusion strategies. Our scheme is highly suitable for applications such as autonomous driving and nighttime surveillance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20057/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20057/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/2508.20057/full.md

---
Source: https://tomesphere.com/paper/2508.20057