# Numerical Signature Dataset of Curculionidae and Tenebrionidae Beetle Fragments for ML Identification

**Authors:** Ronnie O. Serfa Juan, Alison R. Gerken

PMC · DOI: 10.1038/s41597-025-06309-6 · 2025-12-12

## TL;DR

This paper introduces a dataset of numerical signatures from beetle fragments to help machine learning identify pest species in stored products.

## Contribution

The novel contribution is a curated dataset of numerical signatures from beetle fragments for machine learning-based pest identification.

## Key findings

- The dataset includes 3,423 fragment images with numerical signatures for six beetle species.
- Statistical descriptors like skewness and entropy capture morphological variation in beetle fragments.
- The dataset follows FAIR principles for open reuse in entomological AI research.

## Abstract

This data descriptor presents a curated dataset of numerical signature descriptors derived from fragment images of six economically significant stored-product beetle species from the families Curculionidae (Sitophilus zeamais, Sitophilus oryzae, Sitophilus granarius) and Tenebrionidae (Tribolium castaneum, Tribolium confusum, Latheticus oryzae). Anatomical fragments—including antennae, elytra, thorax, snout (Curculionidae), and head aspect ratio (Tenebrionidae)—were imaged using digital microscopy and processed with standardized image acquisition and segmentation techniques. From each image, four statistical descriptors—skewness, kurtosis, entropy, and standard deviation—were extracted, which form compact numerical signatures that capture fragment-level texture and morphological variation. These descriptors are designed to support artificial intelligence and machine learning workflows for automated classification in entomological diagnostics and post-harvest pest detection. The dataset includes 3,423 fragment images, each linked to a numerical signature vector and labeled by species, anatomical region, and metadata. This dataset adheres to Findable, Accessible, Interoperable, Reusable (FAIR) principles and is intended for open reuse in entomological AI research and machine learning-driven insect fragment identification workflows.

## Linked entities

- **Species:** Sitophilus zeamais (taxon 7047), Sitophilus oryzae (taxon 7048), Sitophilus granarius (taxon 7046), Tribolium castaneum (taxon 7070), Tribolium confusum (taxon 7071), Latheticus oryzae (taxon 466960)

## Full-text entities

- **Species:** Tenebrionidae (darkling beetles, family) [taxon 7065], Tribolium confusum (confused flour beetle, species) [taxon 7071], Sitophilus zeamais (maize weevil, species) [taxon 7047], Sitophilus oryzae (rice weevil, species) [taxon 7048], Tribolium castaneum (red flour beetle, species) [taxon 7070], Sitophilus granarius (granary weevil, species) [taxon 7046], Latheticus oryzae (longheaded flour beetle, species) [taxon 466960]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12764967/full.md

---
Source: https://tomesphere.com/paper/PMC12764967