# Integration of inter-simple sequence repeats with machine learning approach for diversity analysis and authentication of Iranian cotton cultivars

**Authors:** Rasmieh Hamid, Zahra Ghorbanzadeh, Bahman Panahi

PMC · DOI: 10.1016/j.bbrep.2025.102435 · Biochemistry and Biophysics Reports · 2026-01-06

## TL;DR

This study combines genetic markers and machine learning to better identify and classify Iranian cotton varieties.

## Contribution

The integration of ISSR markers with machine learning provides a novel framework for cultivar authentication.

## Key findings

- Primers 13, 10, and 26 were identified as the most informative for genetic diversity analysis.
- UPGMA and PCoA grouped cultivars into five distinct genetic clusters.
- Machine learning models achieved high accuracy in cultivar discrimination.

## Abstract

Cotton (Gossypium hirsutum L.) has experienced extensive breeding in recent decades, leading to a narrowed genetic base that presents challenges for accurate germplasm differentiation and cultivar authentication. This study primarily addresses the lack of reliable, scalable, and interpretable tools for distinguishing closely related Iranian cotton cultivars. To overcome this limitation, the research integrates inter-simple sequence repeat (ISSR) markers with machine learning (ML) algorithms to evaluate genetic diversity and establish diagnostic criteria for cultivar identification. Eighteen commercial cultivars were genotyped using 14 ISSR primers and binary scored data (presence/absence of bands) were used to calculate genetic diversity parameters, including the observed number of alleles (Na), effective number of alleles (Ne), Shannon's information index (I), and expected heterozygosity (He) were calculated. Primers 13, 10, and 26 were identified as the most informative loci, yielding the highest values across diversity parameters. Unweighted Pair Group Method with Arithmetic Mean (UPGMA) clustering and principal coordinates analysis (PCoA) revealed five cultivar groups, with several accessions (e.g., Jahesh, Fakhr, Sahel) showing marked genetic distinctiveness. To enhance cultivar authentication, ISSR data were analyzed using ML classifiers. A decision tree model generated transparent band-based rules, while Random Forest feature selection highlighted key diagnostic loci (Primer24_525, Primer2_766). The combined framework achieved high classification accuracy and reproducibility, enabling reliable discrimination among closely related cultivars. These findings demonstrate the novelty and practical utility of integrating multilocus ISSR markers with ML for cultivar authentication, seed certification, and genetic resource management, while also highlighting previously underexplored genetic diversity that can inform cotton breeding programs in Iran.

•Primers 13, 10, and 26 identified as most informative across diversity metrics.•UPGMA and PCoA revealed five cultivar groups with distinct genetic profiles.•Decision tree and Random Forest highlighted key diagnostic loci for classification.•Combined framework achieved high accuracy and reproducibility in cultivar discrimination.

Primers 13, 10, and 26 identified as most informative across diversity metrics.

UPGMA and PCoA revealed five cultivar groups with distinct genetic profiles.

Decision tree and Random Forest highlighted key diagnostic loci for classification.

Combined framework achieved high accuracy and reproducibility in cultivar discrimination.

## Linked entities

- **Species:** Gossypium hirsutum (taxon 3635)

## Full-text entities

- **Species:** Gossypium hirsutum (American cotton, species) [taxon 3635]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12808624/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12808624/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12808624/full.md

---
Source: https://tomesphere.com/paper/PMC12808624