# Agri-vision Bangladesh: A multi-crop augmented image dataset for automated disease diagnosis in Bottle Gourd, Zucchini, Papaya, and Tomato

**Authors:** Md Masum Billah, Md. Anisur Rahman, Saifuddin Sagor, Sanzida Parvin, Mohammad Shorif Uddin

PMC · DOI: 10.1016/j.dib.2026.112528 · 2026-01-29

## TL;DR

Agri-vision Bangladesh is a new dataset of augmented images for diagnosing diseases in four crops, helping improve precision agriculture in subtropical regions.

## Contribution

The paper introduces a region-specific, expert-validated multi-crop image dataset with augmentation for automated disease diagnosis in subtropical agriculture.

## Key findings

- The dataset includes 28,000 images covering four crops and 28 disease classes.
- Augmentation techniques increased the dataset size from 5266 original to 28,000 images.
- The dataset is validated by agronomists and standardized for deep learning applications.

## Abstract

This article introduces Agri-Vision Bangladesh, a comprehensive, augmented image dataset designed to advance automated disease diagnosis in four economically vital agricultural crops: Bottle Gourd (Lagenaria siceraria), Zucchini (Cucurbita pepo), Papaya (Carica papaya), and Tomato (Solanum lycopersicum). Addressing the scarcity of region-specific agricultural data, a total of 5266 original images were acquired directly from diverse agricultural fields in Bangladesh using a SONY ALPHA 7 II full-frame camera under natural lighting conditions. The dataset encompasses 28 distinct classes, covering a wide spectrum of biotic stressors including viral (Mosaic Virus, Leaf Curl), fungal (Downy Mildew, Anthracnose, Alternaria Blight), bacterial (Bacterial Blight, Xanthomonas), and pest-induced damage (Insect Hole, White Spot), alongside Healthy samples. To ensure scientific reliability, each image underwent a rigorous two-stage validation process by senior agronomists. To tackle class imbalance and facilitate the training of data-intensive Deep Learning models, the dataset was expanded using a Python-based augmentation pipeline incorporating geometric transformations (rotation, flipping) and photometric adjustments (noise, brightness) resulting in a final repository of 28,000 images (5266 original and 22,734 augmented). All files are standardized to 512×512 pixels in JPG format. This expert-validated resource serves as a critical benchmark for developing robust computer vision algorithms (e.g., CNNs, Vision Transformers) for precision agriculture, enabling research into fine-grained classification, object detection, and cross-crop transfer learning in subtropical farming environments.

## Linked entities

- **Species:** Lagenaria siceraria (taxon 3668), Cucurbita pepo (taxon 3663), Carica papaya (taxon 3649), Solanum lycopersicum (taxon 4081)

## Full-text entities

- **Chemicals:** Anthracnose (-)
- **Species:** Alternaria sect. Alternaria (section) [taxon 2499237], Carica papaya (mamon, species) [taxon 3649], Lagenaria siceraria (bottle gourd, species) [taxon 3668], Cucurbita melopepo (species) [taxon 3665], Solanum lycopersicum (tomato, species) [taxon 4081], Cucurbita pepo (species) [taxon 3663]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12907883/full.md

---
Source: https://tomesphere.com/paper/PMC12907883