# CocoaMoniliaDataSet: A cocoa pod dataset to detect and classify Monilia roreri in real conditions

**Authors:** Joan Alvarado, Juan Felipe Restrepo-Arias, David Velásquez, John W. Branch-Bedoya, Mikel Maiza

PMC · DOI: 10.1016/j.dib.2025.112447 · 2026-01-07

## TL;DR

This paper introduces a new dataset for detecting and classifying a fungal disease in cocoa pods using computer vision techniques.

## Contribution

The novel contribution is the creation of a labeled dataset for Monilia roreri disease in cocoa pods, supporting real-world computer vision applications.

## Key findings

- The dataset includes 1953 images of cocoa pods labeled across four symptomatic stages of Monilia disease.
- Labels are provided in multiple formats (COCO, YOLO, segmentation masks) to support diverse computer vision algorithms.
- The dataset aims to improve early detection of Monilia roreri, which causes significant yield losses in cocoa production.

## Abstract

Computer vision applications for detecting diseases in agriculture have been gaining relevance in recent years through the use of deep learning architectures. Digital image datasets serve as the main input for these architectures, enabling the analysis of patterns associated with a specific disease. However, some diseases have not yet been explored due to the limited availability of annotated image datasets. Cocoa pods are fundamental for the production of chocolate and its derived products; nevertheless, their production is threatened by Monilia roreri, a fungal disease responsible for yield losses of approximately 30% - 40%. Therefore, this paper proposes a CocoaMoniliaDataSet, a dataset of cocoa pods labeled across symptomatic stages of Monilia disease. Although the infection of cocoa pod caused by Monilia roreri describes four biological cycles, the dataset takes the visual symptoms into three classes to support computer vision task. These symptoms correspond to: cycle 1 (humps), cycle 2 (brown/oily spot and cycle 3 (white powder or sporulation). In this paper, cycle 2 represents a symptomatic stage that merges the second and third biological cycles. In addition, the dataset included healthy cocoa pods to facilitate early detection of the disease. The dataset comprises 1953 images with four labeled classes: (1) healthy cocoa pod labeled as (h0); (2) first Monlilia cycle, humps, labeled as (m1); (3) second - third Monilia cycle of the disease labeled as (m2); and (4) fourth Monilia cycle labeled as m3. Each instance in the image was annotated using the polygon method in CVAT (Computer Vision Annotation Tool), and the resulting labels are provided in COCO 1.0, YOLO, and segmentation mask 1.1 format to enable training object detection algorithms using bounding boxes. The publication of this dataset is essential for exploring techniques to diagnose cocoa disease using computer vision techniques.

## Full-text entities

- **Diseases:** infection (MESH:D007239), fungal disease (MESH:D009181), Monilia disease (MESH:D004194)
- **Species:** Theobroma cacao (cacao, species) [taxon 3641]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12834835/full.md

---
Source: https://tomesphere.com/paper/PMC12834835