# A multi-label dataset for China’s agricultural and rural scenes classification from VHR satellite imagery

**Authors:** Shiying Yuan, Quanlong Feng, Bowen Niu, Xiaolu Yan, Landi Zheng, Zinuo Hao, Dehai Zhu, Jianyu Yang, Jiantao Liu

PMC · DOI: 10.1038/s41597-026-06800-8 · Scientific Data · 2026-02-07

## TL;DR

This paper introduces China-MAS-50k, a new dataset of satellite images for classifying agricultural and rural scenes in China.

## Contribution

The novel contribution is the creation of the first VHR satellite dataset for multi-label classification of rural and agricultural scenes across China.

## Key findings

- The dataset contains 55,520 images with 135,289 labels across 18 categories.
- ResNeXt-101 achieved the best performance with an F1-score of 78.4%.
- Tail categories in the dataset remain challenging for current models.

## Abstract

This study releases China-MAS-50k, i.e., China Multi-label dataset for Agriculture & rural Scene 50k, the first very-high-resolution (VHR) remote sensing dataset for multi-label classification covering entire China’s agricultural and rural areas, filling the gap in finely annotated data for non-urban scene recognition. Based on a 50 km grid system, over 50,000 sample points were determined nationwide, where VHR Google Earth imagery were to be collected for subsequent multi-label annotation. A fine-grained label system comprising 18 categories (e.g., cropland, rural village, greenhouse and photovoltaic station, etc.) was established. Meanwhile, both a rigorously defined visual interpretation system and a labeling procedure including cross-check and error correction were proposed to maintain annotation quality. Finally, the proposed dataset has a total of 55,520 VHR images with 135,289 labels, which exhibits a long-tail distribution thus providing a challenging benchmark dataset. Furthermore, we evaluated the performance of mainstream multi-label classification models on the China-MAS-50k dataset, where ResNeXt-101 achieved the best performance with an F1-score of 78.4%, but exhibited limitations in recognizing tail categories.

## Full-text entities

- **Genes:** AICDA (activation induced cytidine deaminase) [NCBI Gene 57379] {aka AID, ARP2, CDA2, HEL-S-284, HIGM2}
- **Chemicals:** ResNeXt (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12992661/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12992661/full.md

## References

5 references — full list in the complete paper: https://tomesphere.com/paper/PMC12992661/full.md

---
Source: https://tomesphere.com/paper/PMC12992661