MegaPlantTF: a machine learning framework for comprehensive identification and classification of plant transcription factors
Genereux Akotenou, Asmaa H Hassan, Morad M Mokhtar, Achraf El Allali

TL;DR
MegaPlantTF is a new machine learning tool that helps identify and classify plant transcription factors more accurately and efficiently.
Contribution
MegaPlantTF introduces a novel two-stage machine learning framework integrating k-mer encoding and stacking ensembles for plant TF prediction and classification.
Findings
MegaPlantTF achieves strong accuracy and precision in TF prediction and classification.
The method performs well even under stringent thresholds and in genome-wide analysis of Sorghum bicolor.
The framework is publicly available with source code and pretrained models.
Abstract
Understanding the role of transcription factors (TFs) in plants is essential for the study of gene regulation and various biological processes. However, both TF detection and classification remain challenging due to the great diversity and complexity of these proteins. Conventional approaches, such as BLAST, often suffer from high computational complexity and limited performance on less common TF families. We introduce MegaPlantTF, the first comprehensive machine learning and deep learning framework for the prediction (TF versus non-TF) and classification (family-level) of plant TFs. Our method employs k-mer-based protein representations and a two-stage architecture combining a deep feed-forward neural network with a stacking ensemble classifier. To ensure robust performance assessment, we report micro-, macro-, and weighted-average performance metrics, providing a holistic evaluation…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Machine Learning in Bioinformatics · Plant Molecular Biology Research
