# EDEN: multiscale expected density of nucleotide encoding for enhanced DNA sequence classification with hybrid deep learning

**Authors:** Saman Zabihi, Sattar Hashemi, Eghbal Mansoori

PMC · DOI: 10.1186/s12859-026-06367-6 · BMC Bioinformatics · 2026-01-24

## TL;DR

EDEN is a new method for DNA sequence classification that improves accuracy while using fewer resources, making it efficient for large-scale genomic studies.

## Contribution

EDEN introduces a novel multiscale encoding framework using kernel density estimation for DNA sequence classification.

## Key findings

- EDEN achieves the best average performance across sixteen benchmark datasets.
- EDEN uses orders of magnitude fewer parameters than state-of-the-art models.
- EDEN provides a biologically informed and interpretable representation for genomic sequences.

## Abstract

DNA sequences are fundamental carriers of genetic information, and their accurate classification is essential for understanding gene regulation, disease mechanisms, and translational genomics. Existing encoding methods often fail to capture both local and long-range dependencies simultaneously.

We introduce EDEN (Expected Density of Nucleotide Encoding), a unified multiscale encoding framework based on kernel density estimation. EDEN captures position-specific and context-dependent nucleotide patterns and integrates them into a hybrid deep learning architecture. Across sixteen benchmark datasets covering promoter detection, core promoter detection, and transcription factor binding prediction, EDEN achieves the best average performance while using orders of magnitude fewer parameters compared with state-of-the-art models. All source code, pretrained models, and datasets are publicly available at: https://github.com/zabihis/EDEN.

EDEN provides an efficient, biologically informed, and interpretable multiscale representation for genomic sequence classification. Its favorable parameter-performance ratio and robust consistency across tasks underscore its practicality for large-scale genomic applications.

## Full-text entities

- **Genes:** RNASE2 (ribonuclease A family member 2) [NCBI Gene 6036] {aka EDN, RAF3, RNS2}, MCC (MCC regulator of Wnt signaling pathway) [NCBI Gene 4163] {aka MCC1}, F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, INHCAP (inhibitor of carbonic anhydrase pseudogene) [NCBI Gene 100129696] {aka TFP, TFP1}, CPD (carboxypeptidase D) [NCBI Gene 1362] {aka GP180}
- **Diseases:** EDEN (MESH:C564021)
- **Chemicals:** EDEN (-), Nucleotide (MESH:D009711), hydrogen (MESH:D006859), Adenine (MESH:D000225)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12879454/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12879454/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC12879454/full.md

---
Source: https://tomesphere.com/paper/PMC12879454