A BERT-based rice enhancer identification model combined with sequence-representation differential entropy interpretation
Yajing Pu, Xintong Hao, Zhaoqi Zheng, Huiyan Ma, Zhibin Lv

TL;DR
This paper introduces a new model for identifying rice enhancers using BERT and SVM, achieving high accuracy and offering insights into model performance through entropy analysis.
Contribution
A novel RiceEN-BERT-SVM model and a differential entropy-based interpretation framework for enhancer identification in rice.
Findings
The RiceEN-BERT-SVM model achieved 88.05% accuracy in 5-fold cross-validation and 87.55% in independent testing.
Fine-tuning improved accuracy by 6.95%, reaching 93.63% at six iterations.
Differential entropy analysis revealed optimal performance when positive and negative sample distributions were most separated.
Abstract
Rice is a crucial food crop, and research into its gene expression regulation holds significant importance for molecular breeding and yield improvement. Enhancers, as key elements regulating the spatiotemporal-specific expression of genes, represent a core challenge in functional genomics due to their precise identification requirements. Current deep learning-based methods for rice enhancer identification face limitations primarily in feature extraction efficiency and the generalization capabilities of model architectures. In response, this study introduces a novel model architecture, RiceEN-BERT-SVM, which integrates DNABERT-2 as a feature extraction tool, alongside Support Vector Machine (SVM) for enhancer sequence classification. The mechanism underlying the optimization of model performance is elucidated through differential entropy analysis of feature representations. Experimental…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Machine Learning in Bioinformatics
