# dbscATAC: a resource of single-cell super-enhancers/enhancers and gene markers derived from scATAC-seq data

**Authors:** Yingmei Li, Shahid Ullah, Yumei Xian, Yazhou Sun, Zilong Zheng, Xiaoyu Ma, Ming Shi, Changlin Zhang, Tian Li, Leli Zeng, Jie Chen, Yubin Y B Deng, Fuxin Wei, Tianshun Gao

PMC · DOI: 10.1093/bioinformatics/btaf364 · Bioinformatics · 2025-06-23

## TL;DR

dbscATAC is a database that provides detailed annotations of super-enhancers, enhancers, and gene markers from scATAC-seq data across multiple species and cell types.

## Contribution

dbscATAC introduces a comprehensive, annotated resource of super-enhancers, enhancers, and gene markers derived from scATAC-seq data using improved machine learning.

## Key findings

- Identified 213,835 super-enhancers across 520 tissue/cell types in three species.
- Provided 347,484 gene markers and 10,402,346 enhancer–gene interactions from over 1.6 million single cells.
- Developed an online platform for querying and visualizing single-cell regulatory elements.

## Abstract

scATAC-seq enables high-resolution mapping of cis-regulatory elements. It has been widely applied to uncover cell-type-specific regulatory networks and complement scRNA-seq analysis in numerous studies. However, a large number of datasets generated by scATAC-seq remain underutilized due to limited exploration of super-enhancers/typical enhancers and gene markers. A comprehensive resource enabling cell-type-specific annotation of cis-regulatory elements and their dynamic enhancer–gene linkages remains an urgent unmet need for scATAC-seq.

We present dbscATAC, a specialized single-cell database for annotating super-enhancers, gene markers, and enhancer–gene interactions derived from scATAC-seq data. Using improved machine learning algorithms, we identified 213 835 super-enhancers across 520 tissue/cell types from three species, as well as 347 484 gene markers, 13 470 526 enhancers, and 10 402 346 enhancer–gene interactions derived from 1 668 076 single cells spanning 1028 tissue/cell types in 13 species. An easy-to-use online platform with multiple analytic modules and hierarchical query options was developed for searching, browsing and visualizing single-cell super-enhancers, enhancers, and gene markers. dbscATAC provides a comprehensive resource to facilitate the exploration of enhancer landscapes, gene regulation, and cell-type-specific characteristics in single-cell epigenomics.

The database with all the super-enhancer/enhancer annotation data is available at http://singlecelldb.com/dbscATAC/index.php. And the source code of dbscATAC for prediction of SEs, enhancers, and gene markers are available at https://github.com/EvansGao/dbscATAC. The source code, tissue/cell type description, and data summary can be downloaded at DOI: 10.6084/m9.figshare.28706414.

scATAC-seq, Database, Super-enhancers/enhancers, Gene markers

## Full-text entities

- **Genes:** SQLE (squalene epoxidase) [NCBI Gene 6713], NANOG (Nanog homeobox) [NCBI Gene 79923], SOX2 (SRY-box transcription factor 2) [NCBI Gene 6657] {aka ANOP3, MCOPS3}, mol (moladietz) [NCBI Gene 34872] {aka 35Bb, B1, BG:DS01219.1, CG15268, CG4482, Dmel\CG4482}, MED1 (mediator complex subunit 1) [NCBI Gene 5469] {aka CRSP1, CRSP200, DRIP205, DRIP230, PBP, PPARBP}, nej (nejire) [NCBI Gene 43856] {aka CBP, CBP/p300, CBP_, CG15319, CG15321, Cbp}, CREBBP (CREB binding lysine acetyltransferase) [NCBI Gene 1387] {aka CBP, KAT3A, MKHK1, RSTS, RSTS1}, crc (cryptocephal) [NCBI Gene 47767] {aka 929, ATF-4, ATF4, ATF4/crc, Atf4, CG8669}, POU5F1 (POU class 5 homeobox 1) [NCBI Gene 5460] {aka OCT3, OCT4, OCT4Borf1, OTF-3, OTF3, OTF4}, MED1 (Mediator complex subunit 1) [NCBI Gene 40403] {aka CG7162, Dmel\CG7162, FBgn0037109, Med1/SOP3/TRAP220, Trap220, dTRAP220}, BRD4 (bromodomain containing 4) [NCBI Gene 23476] {aka CAP, CDLS6, FSHRG4, HUNK1, HUNKI, MCAP}
- **Diseases:** SEs (MESH:C535318), cancer (MESH:D009369)
- **Chemicals:** scATAC (-)
- **Species:** Chlorocebus pygerythrus (vervet, species) [taxon 60710], Cercopithecidae (monkey, family) [taxon 9527], Danio rerio (leopard danio, species) [taxon 7955], Macaca mulatta (rhesus macaque, species) [taxon 9544], Homo sapiens (human, species) [taxon 9606], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Drosophila melanogaster (fruit fly, species) [taxon 7227], Callithrix jacchus (common marmoset, species) [taxon 9483], Oryza sativa (Asian cultivated rice, species) [taxon 4530], Rattus norvegicus (brown rat, species) [taxon 10116], Pan troglodytes (chimpanzee, species) [taxon 9598], Mus musculus (house mouse, species) [taxon 10090], Gallus gallus (bantam, species) [taxon 9031]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12237509/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12237509/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12237509/full.md

---
Source: https://tomesphere.com/paper/PMC12237509