# Multi-step chestnut physical characteristics classification model based on vision transformation using a single-view RGB image

**Authors:** Tae Hyong Kim, Ki Hyun Kwon, Ah-Na Kim

PMC · DOI: 10.1038/s41598-025-34787-6 · Scientific Reports · 2026-02-11

## TL;DR

This paper introduces a new method using vision transformers to automatically classify chestnuts by type, size, and quality from single RGB images, improving sorting efficiency.

## Contribution

A novel k-means clustering–vision transformer framework for multi-step chestnut classification using single-view images.

## Key findings

- ViT outperformed CNN models like DarkNet-53, ResNet-50, and EfficientNetB0 in classification accuracy.
- The k-means clustering preprocessing improved segmentation of chestnut regions in images.
- The proposed framework shows potential for scalable and reliable automated sorting in commercial settings.

## Abstract

Chestnut classification is essential for improving postharvest processing efficiency and supporting large-scale commercialization; however, conventional manual sorting is labor intensive, inconsistent, and unsuitable for high-throughput operations. To address these challenges, this study proposes a k-means clustering–vision transformer (ViT)–based approach for classifying chestnuts into five cultivars, two size grades, and two rottenness states using a single-view RGB image. A total of 17,797 images were preprocessed using k-means clustering to segment chestnut regions, and four deep learning models—ViT, EfficientNetB0, ResNet-50, and DarkNet-53—were trained for multi-class classification. Model performance was evaluated using accuracy, precision, recall, and F1-score. Among the CNN models, DarkNet-53 achieved the highest performance, followed by ResNet-50 and EfficientNetB0. The ViT model outperformed all CNN models across all classification tasks, demonstrating superior pattern-recognition capability likely attributable to its self-attention mechanism, which effectively captures global contextual relationships within images. These results indicate that the proposed k-means–ViT framework provides a highly accurate and efficient solution for automated chestnut sorting. The approach shows strong potential for enhancing industrial grading systems by enabling reliable, scalable, and data-driven quality assessment.

## Full-text entities

- **Diseases:** disease (MESH:D004194)
- **Chemicals:** Chestnut (-), sugar (MESH:D000073893)
- **Species:** Castanea mollissima (Chinese chestnut, species) [taxon 60419], Cyphia crenata (species) [taxon 2041116], Castanea (genus) [taxon 21019], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Homo sapiens (human, species) [taxon 9606], Solanum tuberosum (potatoes, species) [taxon 4113]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12894714/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12894714/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12894714/full.md

---
Source: https://tomesphere.com/paper/PMC12894714