# EnDeep4mC predicts DNA N4-methylcytosine sites using a dual-adaptive feature encoding framework in deep ensembles

**Authors:** Shuyu Zhang, Quan Zou, Mengting Niu, Zhibin Lv, Antony Stalin, Ximei Luo

PMC · DOI: 10.1101/gr.280977.125 · Genome Research · 2026-03-01

## TL;DR

EnDeep4mC is a new method that improves prediction of DNA N4-methylcytosine sites using adaptive feature encoding and deep learning, working well across different species.

## Contribution

The novel dual-adaptive feature encoding framework in EnDeep4mC enhances prediction accuracy and cross-species transferability for DNA N4-methylcytosine detection.

## Key findings

- EnDeep4mC outperforms existing predictors in predicting DNA N4-methylcytosine sites across six species.
- Cross-species validation shows robust performance from animal to microbial groups.
- Evolutionary analysis reveals distinct 4mC sequence patterns in prokaryotes versus eukaryotes.

## Abstract

DNA N4-methylcytosine (4mC), a key epigenetic modification regulating DNA repair and replication, requires efficient computational detection methods due to experimental limitations. Although machine learning predictors have been proposed, their performance could be enhanced through systematic optimization of feature encoding schemes. Here, we propose EnDeep4mC, a dual-adaptive framework integrating species-specific modeling with ensemble deep learning architectures to systematically optimize feature encoding schemes. Evaluated across six species, EnDeep4mC demonstrates commendable prediction performance and significantly outperforms current state-of-the-art predictors. Cross-species validation confirms its robust transferability from animal to microbe groups. Evolutionary analysis further uncovers the functional differentiation of 4mC sequences in biological evolution: Prokaryotic 4mC relies on stable patterns, whereas eukaryotes achieve regulatory plasticity through dynamic sequence combinations, which provides experimental evidence for species-adaptive encoding strategies.

## Full-text entities

- **Diseases:** DL (MESH:D007859), DFS (MESH:D000092242)
- **Chemicals:** 5-methylcytosine (MESH:D044503), nucleotide (MESH:D009711), 4mC (MESH:C000612305), N 4-methylcytosine (MESH:C039052), N6-methyladenine (MESH:C005955), poly(A) (MESH:D011061), 4mC (-)
- **Species:** C. elegans [taxon 328850], Geobacillus subterraneus (species) [taxon 129338], Escherichia coli (E. coli, species) [taxon 562], Fragaria vesca (alpine strawberry, species) [taxon 57918], Caenorhabditis elegans (species) [taxon 6239], Casuarina equisetifolia (species) [taxon 3523], Klebsiella pneumoniae (species) [taxon 573], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Enterococcus faecium (species) [taxon 1352], Geobacter pickeringii (species) [taxon 345632], Drosophila melanogaster (fruit fly, species) [taxon 7227], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Listeria monocytogenes (species) [taxon 1639], Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12951953/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12951953/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC12951953/full.md

---
Source: https://tomesphere.com/paper/PMC12951953