# CISCS: Classification of inter-class similarity based medicinal plant species groups with machine learning

**Authors:** N. Shobha Rani, Bhavya K R, I. Jeena Jacob, Pushpa B. R, Bipin Nair BJ, Akshatha Prabhu

PMC · DOI: 10.1016/j.mex.2025.103652 · MethodsX · 2025-09-30

## TL;DR

This paper introduces a machine learning model that improves the classification of visually similar medicinal plant species, especially when datasets are imbalanced.

## Contribution

A novel multi-level feature fusion method is proposed, combining color, texture, and shape features with an ensemble framework to enhance classification accuracy.

## Key findings

- The proposed model achieved 100% accuracy in Group 1 and over 90% in other groups of Indian medicinal plant species.
- The model outperformed deep learning baselines like ResNet18 and VGG16 in terms of accuracy and robustness.
- The integration of SMOTE-based augmentation and soft-voting ensemble improved performance under class imbalance conditions.

## Abstract

The reliable classification of medicinal plant species plays a vital role in ensuring their quality, authenticity, and safe use in healthcare. However, existing methods often face difficulties when species exhibit strong visual similarities or when datasets are imbalanced, which limits their effectiveness in practice. Although deep learning models such as ResNet18 and VGG16 have proven influential in image recognition tasks, our experiments showed that they tended to overfit, with validation losses reaching 42.99 % and test accuracy falling to 73.99 % in certain groups. To overcome these challenges, we introduce a multi-level fusion feature model that combines 3D normalized color histograms, extended uniform Local Binary Patterns (LBP with P = 24, R = 3), multi-orientation Gabor filters, and Histogram of Oriented Gradients (HOG). This approach captures a richer set of visual cues by bringing together global color statistics, detailed textures, frequency-domain patterns, and shape descriptors. We incorporate SMOTE-based synthetic augmentation to address further class imbalance, which helps balance feature distributions across categories. We employ a soft-voting ensemble of machine learning classifiers for classification and use cosine similarity metrics to capture inter-class relationships better. Tests on Indian medicinal plant datasets show that our model consistently outperforms deep learning baselines, reaching 100 % accuracy in Group 1, 95.82 % in Group 3, and over 90 % in other groups. These results suggest that the proposed model offers a more robust and computationally efficient solution for plant species classification, particularly under conditions of high inter-class similarity and dataset imbalance.•The proposed domain-specific model can be applied explicitly to Indian plant species groups exhibiting high inter-class visual similarities through a novel feature fusion strategy.•The proposed multi-level feature fusion method's innovation integrates 3D normalized color histograms, extended uniform LBP (P = 24, R = 3), multi-orientation Gabor filters, and HOG features to capture the color, texture, and shape characteristics.•The proposed work offers a scalable ensemble framework for inter-class similarity analysis by combining SMOTE-based class balancing, feature normalization, and a soft-voting ensemble of diverse classifiers that support biodiversity and ecological studies.

The proposed domain-specific model can be applied explicitly to Indian plant species groups exhibiting high inter-class visual similarities through a novel feature fusion strategy.

The proposed multi-level feature fusion method's innovation integrates 3D normalized color histograms, extended uniform LBP (P = 24, R = 3), multi-orientation Gabor filters, and HOG features to capture the color, texture, and shape characteristics.

The proposed work offers a scalable ensemble framework for inter-class similarity analysis by combining SMOTE-based class balancing, feature normalization, and a soft-voting ensemble of diverse classifiers that support biodiversity and ecological studies.

Image, graphical abstract

## Full-text entities

- **Genes:** PGR (progesterone receptor) [NCBI Gene 5241] {aka NR3C3, PR}
- **Diseases:** MULTI (MESH:D015161), LEVEL-FUSION (MESH:D000069337), DL (MESH:D007859), palm disease (MESH:C535620)
- **Chemicals:** essential oil (MESH:D009822), Medicinal Botanical Garden (-)
- **Species:** Trigonella foenum-graecum (fenugreek, species) [taxon 78534], Azadirachta indica (Indian-lilac, species) [taxon 124943], Cynodon dactylon (Bermuda grass, species) [taxon 28909], Zingiber officinale (ginger, species) [taxon 94328], Salvia rosmarinus (rosemary, species) [taxon 39367], Eucalyptus (genus) [taxon 3932], Amaranthus (genus) [taxon 3564], Homo sapiens (human, species) [taxon 9606], Allium cepa (onion, species) [taxon 4679], Calendula officinalis (common marigold, species) [taxon 41496], Citronella (genus) [taxon 159356], Cymbopogon citratus (lemon grass, species) [taxon 66014]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12550246/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12550246/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12550246/full.md

---
Source: https://tomesphere.com/paper/PMC12550246