# An open image dataset of Indonesian soybean seed varieties (Anjasmoro, Grobogan, DEGA-1) for agricultural research and machine learning applications

**Authors:** Diana Sofia Hanafiah, Rahmatika Alfi, Anggria Lestami, Fanindia Purnamasari, Rossy Nurhasanah, Muhammad Ariyo Syahraza, Muhammad Azis Saputra, Usman Ismail Pane, Steven Manurung, Keisya, Yunus Tio Buntoro, Josua Peter Corda, Gali Rakasiwi

PMC · DOI: 10.1016/j.dib.2026.112524 · 2026-02-03

## TL;DR

This paper introduces a new open dataset of high-resolution images of three Indonesian soybean varieties to support agricultural research and machine learning applications.

## Contribution

The novel contribution is the creation and release of a standardized open image dataset for Indonesian soybean seeds to enable automated identification and analysis.

## Key findings

- The dataset includes high-resolution images of three Indonesian soybean varieties: Anjasmoro, Grobogan, and DEGA-1.
- The dataset supports automated seed image segmentation using Deeplab V3+ with MobileNet as backbone.
- The dataset is intended for use in computer vision tasks and agricultural research.

## Abstract

Soybean (Glycine max L.) performs an important position as a main resource of protein in Indonesia. Its quality and productivity can be assessed based on the characteristics of its seed. Accordingly, the identification process through the observation of soybean seed traits is a crucial step in plant breeding and quality assurance. Manual approaches rely on manual observation, which is subjective, prone to human error and time-consuming. With the improvement of artificial intelligence, automated seed identification has appeared as a potential solution. However, progress is constrained by the lack of open and standardized image datasets, especially for locally bred varieties in developing countries. To address this gap, we propose an open image dataset of Indonesian soybean seeds from three widely cultivated and plant-bred varieties: Anjasmoro, Grobogan, and DEGA-1. The dataset consists of high-resolution seed images captured with an Epson L360 flatbed scanner, with the optical resolution fixed at 800 dots per inch, yielding images of 6800 × 9359 pixels. All raw images are saved in JPG format. No manually segmentation masks are released in this version, instead of using Deeplab V3+ with MobileNet as backbone to enable the automated seed image segmentation. The curated dataset is intended to support a broad range of applications, including computer vision tasks such as image classification and segmentation, as well as research in plant breeding, seed quality assessment, and agricultural informatics. By providing a standardized and publicly accessible resource, this dataset contributes to the advancement of interdisciplinary studies at the intersection of agriculture and artificial intelligence.

## Full-text entities

- **Chemicals:** oil (MESH:D009821), Ismail (-)
- **Species:** Glycine max (soybean, species) [taxon 3847], Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12925431/full.md

---
Source: https://tomesphere.com/paper/PMC12925431