# PETScan: score-based genome-wide association analysis of RNA-Seq and ATAC-Seq data

**Authors:** Yajing Hao, Tal Kafri, Fei Zou

PMC · DOI: 10.1093/bioinformatics/btaf672 · Bioinformatics · 2026-02-20

## TL;DR

PETScan is a new method for analyzing RNA-Seq and ATAC-Seq data to study gene regulation more efficiently and accurately.

## Contribution

PETScan introduces a score-based approach using negative binomial models for faster and more accurate genome-wide association analysis.

## Key findings

- PETScan is three orders of magnitude faster than Wald tests in real-world datasets.
- The method identifies significant gene-peak pairs while accounting for RNA-Seq data's count-based nature.
- PETScan improves computational efficiency using score tests and matrix calculations.

## Abstract

High-dimensional sequencing data, such as RNA-Seq for gene expression and ATAC-Seq for chromatin accessibility, are widely used in studying systems biology. Accessible chromatin allows transcription factors and regulatory elements to bind to DNA, thereby regulating transcription through the activation or repression of target genes. The association analysis of RNA-Seq and ATAC-Seq data provides insights into gene regulatory mechanisms. Most existing analytic tools exclusively focus on cis-associations, despite regulatory elements being able to physically interact with distant target genes. Furthermore, conventional approaches often utilize Pearson or Spearman correlations, which ignore the count-based nature of RNA-Seq data.

To address these limitations, we introduce PETScan, a computationally efficient genome-wide PEak-Transcript Score-based association analysis, utilizing negative binomial models to better accommodate RNA-Seq data. We leverage score tests and matrix calculations for improved computational efficiency, and combine an empirical permutation method with genomic control to ensure valid p-value calculations in studies with limited sample sizes. In real-world datasets, PETScan achieved three orders of magnitude faster than Wald tests, while identifying similar significant gene-peak pairs.

The PETScan R package is available on GitHub at https://github.com/yajing-hao/PETScan.

## Full-text entities

- **Genes:** Cldn4 (claudin 4) [NCBI Gene 12740] {aka Cep-r, Cpetr, Cpetr1}, Id2 (inhibitor of DNA binding 2) [NCBI Gene 15902] {aka Idb2, bHLHb26}, Egfr (epidermal growth factor receptor) [NCBI Gene 13649] {aka 9030024J15Rik, Erbb, Errb1, Errp, Wa5, wa-2}, Hnf1b (HNF1 homeobox B) [NCBI Gene 21410] {aka HNF-1-beta, HNF-1B, HNF-1Beta, Hnf1beta, LFB3, Tcf-2}, Neurod6 (neurogenic differentiation 6) [NCBI Gene 11922] {aka Atoh2, Math-2, Math2, Nex, Nex1m, bHLHa2}, Rpl32 (ribosomal protein L32) [NCBI Gene 19951] {aka rpL32-3A}, Lhx2 (LIM homeobox protein 2) [NCBI Gene 16870] {aka LH2A, Lh-2, Lim2, ap, apterous}, Zbtb18 (zinc finger and BTB domain containing 18) [NCBI Gene 30928] {aka RP58, Zfp238, Znf238., zfp-238}, Bicdl2 (BICD family like cargo adaptor 2) [NCBI Gene 212733] {aka BICDR-2, Ccdc64b}, Nfix (nuclear factor I/X) [NCBI Gene 18032] {aka CTF, NF-I/X, NF1-X, NFI-X}, Nrxn3 (neurexin III) [NCBI Gene 18191], Barx2 (BarH-like homeobox 2) [NCBI Gene 12023] {aka 2310006E12Rik, Barx2b}, Pkp3 (plakophilin 3) [NCBI Gene 56460] {aka 2310056L12Rik}, Car12 (carbonic anhydrase 12) [NCBI Gene 76459] {aka 2310047E01Rik, CA-XII, Ca12}, St8sia5 (ST8 alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase 5) [NCBI Gene 225742] {aka ST8SiaV, Siat8e}, Itpr3 (inositol 1,4,5-triphosphate receptor 3) [NCBI Gene 16440] {aka IP3R 3, IP3R-3, Ip3r3, Itpr-3, tf}, Egf (epidermal growth factor) [NCBI Gene 13645], Trpv6 (transient receptor potential cation channel, subfamily V, member 6) [NCBI Gene 64177] {aka CAT, CaT1, Cac, Ecac2, Otrpc3}, Slc4a4 (solute carrier family 4 (anion exchanger), member 4) [NCBI Gene 54403] {aka NBC, NBC1}, Grhl2 (grainyhead like transcription factor 2) [NCBI Gene 252973] {aka 0610015A08Rik, BOM, Tcfcp2l3, clft3}, Fam3b (FAM3 metabolism regulating signaling molecule B) [NCBI Gene 52793] {aka 2-21, 9030624C24Rik, D16Jhu19e, ORF9, Pander}, Rhov (ras homolog family member V) [NCBI Gene 228543] {aka A030005A06Rik, Arhv}
- **Diseases:** hepatocellular carcinoma (MESH:D006528), maturity onset diabetes (MESH:D003924), tumor (MESH:D009369), diabetes (MESH:D003920), pancreatic disorders (MESH:D010195), asthma (MESH:D001249), pancreatic cancer (MESH:D010190)
- **Chemicals:** calcium (MESH:D002118), glucose (MESH:D005947)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Mutations:** rs66817580, rs4444903

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12930850/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12930850/full.md

## References

62 references — full list in the complete paper: https://tomesphere.com/paper/PMC12930850/full.md

---
Source: https://tomesphere.com/paper/PMC12930850