# Multiscale scene parsing network

**Authors:** YuanYuan Wang, Zining Zhao, Yilin Liu, Jibin Wang, Haiyan Zhang, Jiajun Wang, Luyue Liu

PMC · DOI: 10.1038/s41598-025-29315-5 · Scientific Reports · 2025-12-02

## TL;DR

This paper introduces MSPNet, a lightweight and efficient network for scene parsing that improves accuracy and performance on mobile devices.

## Contribution

MSPNet introduces EPLA module and integrates it with StarNet for efficient and accurate multiscale scene parsing.

## Key findings

- MSPNet achieves 87.19% mIoU on Pascal VOC2012, a 1.79% improvement over PSPNet.
- The EPLA module reduces attention computation overhead by 38% while improving feature localization.
- MSPNet outperforms lightweight SOTA models in both accuracy and efficiency.

## Abstract

To address the core challenge faced by existing lightweight scene parsing networks—balancing multiscale feature representation precision and computational efficiency (rather than “difficulties in extracting multi-scale information”)—this paper proposes MSPNet, a lightweight multiscale scene parsing network. The network adopts StarNet as the backbone to leverage its efficient low-to-high dimensional feature transformation capability, and innovatively embeds the Efficient Pixel Localization Attention (EPLA) module into the PSPNet architecture. Unlike simple module stacking, the EPLA module integrates two synergistic submodules: ELA (Efficient Localization Attention) and PagFM (Pyramid Attention-Guided Feature Module). The ELA module uses a dynamic weight allocation mechanism to achieve precise pixel-level feature localization while reducing attention computation overhead by 38%; the PagFM module constructs a hierarchical pyramid fusion architecture, adaptively guiding cross-scale feature integration to enhance small-target representation. Additionally, MSPNet incorporates depthwise separable convolutions and channel reparameterization techniques, further optimizing model compactness. Experimental results on the Pascal VOC2012 validation set show that MSPNet achieves a mean Intersection over Union (mIoU) of 87.19%, a 1.79% improvement over PSPNet. With GFLOPs (9.7G for StarNet-s4 backbone) and parameter counts (7.4 M) comparable to the MobileNet series, MSPNet outperforms contemporary lightweight SOTA models in both accuracy and efficiency, providing an effective solution for real-time semantic segmentation on resource-constrained mobile devices. The code for MSPNet is available at https://github.com/Eric-863/MSPnet.

## Full-text entities

- **Genes:** APELA (apelin receptor early endogenous ligand) [NCBI Gene 100506013] {aka ELA, Ende, tdl}
- **Diseases:** PagFM (MESH:D015419)
- **Chemicals:** MPA (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12764455/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12764455/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/PMC12764455/full.md

---
Source: https://tomesphere.com/paper/PMC12764455