# GreenViT: A Vision Transformer with Single-Path Progressive Upsampling for Urban Green-Space Segmentation and Auditable Area Estimation

**Authors:** Ziqiang Xu, Young Choi, Changyong Yi, Chanjeong Park, Jinyoung Park, Hyungkeun Park, Sujeen Song

PMC · DOI: 10.3390/jimaging12020072 · Journal of Imaging · 2026-02-10

## TL;DR

GreenViT is a new Vision Transformer framework that improves accuracy and transparency in mapping urban green spaces using satellite images.

## Contribution

GreenViT introduces a novel ViT-based architecture with a progressive upsampling decoder for precise and auditable urban green-space segmentation.

## Key findings

- GreenViT achieved 0.9200 mIoU, 0.9580 Dice, and 0.9570 PA on a dataset of 20 satellite images.
- The framework's calibrated estimator achieved 1.10% relative area error for green-space quantification.
- GreenViT balances accuracy and efficiency, especially for thin or boundary-rich urban green-space classes.

## Abstract

Urban green-space monitoring in dense cityscapes remains limited by accuracy–efficiency trade-offs and the absence of integrated, auditable area estimation. We introduce GreenViT, a Vision Transformer (ViT) based framework for precise segmentation and transparent quantification of urban green space. GreenViT couples a ViT-L/14 backbone with a lightweight single-path, progressive upsampling decoder (Green Head), preserving global context while recovering thin structures. Experiments were conducted on a manually annotated dataset of 20 high-resolution satellite images collected from Satellites.Pro, covering five land-cover classes (background, green space, building, road, and water). Using a 224 × 224 sliding window sampling scheme, the 20 images yield 62,650 training/validation patches. Under five-fold evaluation, it attains 0.9200 ± 0.0243 mIoU, 0.9580 ± 0.0135 Dice, and 0.9570 PA, and the calibrated estimator achieves 1.10% relative area error. Overall, GreenViT strikes a strong balance between accuracy and efficiency, making it particularly well-suited for thin or boundary-rich classes. It can be used to support planning evaluations, green-space statistics, urban renewal assessments, and ecological red-line verification, while providing reliable green-area metrics to support urban heat mitigation and pollution control efforts. This makes it highly suitable for decision-oriented long-term monitoring and management assessments.

## Full-text entities

- **Diseases:** cardiovascular diseases (MESH:D002318), mental health disorders (OMIM:603663), weight loss (MESH:D015431), deaths (MESH:D003643), -Entropy (CE (MESH:C537866), Tversky Loss (MESH:D016388), injury to (MESH:D014947), respiratory diseases (MESH:D012140)
- **Chemicals:** ozone (MESH:D010126), GreenViT (-), water (MESH:D014867)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12942620/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12942620/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12942620/full.md

---
Source: https://tomesphere.com/paper/PMC12942620