# Differentially expressed heterogeneous overdispersion genes testing for count data

**Authors:** Yubai Yuan, Qi Xu, Agaz Wani, Jan Dahrendorff, Chengqi Wang, Arlina Shen, Janelle Donglasan, Sarah Burgan, Zachary Graham, Monica Uddin, Derek Wildman, Annie Qu

PMC · DOI: 10.1371/journal.pone.0300565 · 2024-07-17

## TL;DR

This paper introduces a new method for identifying differentially expressed genes in RNA-seq data that improves detection power when sample sizes are small.

## Contribution

The novel DEHOGT method uses heterogeneous overdispersion modeling to better detect differentially expressed genes in RNA-seq data.

## Key findings

- DEHOGT outperforms DESeq2 and EdgeR in detecting differentially expressed genes in synthetic RNA-seq data.
- DEHOGT detects more differentially expressed genes in microglial cells under stress hormone treatments.
- The method enhances detection power when the number of replicates is limited but the number of conditions is large.

## Abstract

The mRNA-seq data analysis is a powerful technology for inferring information from biological systems of interest. Specifically, the sequenced RNA fragments are aligned with genomic reference sequences, and we count the number of sequence fragments corresponding to each gene for each condition. A gene is identified as differentially expressed (DE) if the difference in its count numbers between conditions is statistically significant. Several statistical analysis methods have been developed to detect DE genes based on RNA-seq data. However, the existing methods could suffer decreasing power to identify DE genes arising from overdispersion and limited sample size, where overdispersion refers to the empirical phenomenon that the variance of read counts is larger than the mean of read counts. We propose a new differential expression analysis procedure: heterogeneous overdispersion genes testing (DEHOGT) based on heterogeneous overdispersion modeling and a post-hoc inference procedure. DEHOGT integrates sample information from all conditions and provides a more flexible and adaptive overdispersion modeling for the RNA-seq read count. DEHOGT adopts a gene-wise estimation scheme to enhance the detection power of differentially expressed genes when the number of replicates is limited as long as the number of conditions is large. DEHOGT is tested on the synthetic RNA-seq read count data and outperforms two popular existing methods, DESeq2 and EdgeR, in detecting DE genes. We apply the proposed method to a test dataset using RNAseq data from microglial cells. DEHOGT tends to detect more differently expressed genes potentially related to microglial cells under different stress hormones treatments.

## Full-text entities

- **Genes:** NR2F2 (nuclear receptor subfamily 2 group F member 2) [NCBI Gene 7026] {aka ARP-1, ARP1, CHTD4, COUPTF2, COUPTFB, COUPTFII}, CRISPLD2 (cysteine rich secretory protein LCCL domain containing 2) [NCBI Gene 83716] {aka CRISP11, LCRISP2, LGL1}, CREB5 (cAMP responsive element binding protein 5) [NCBI Gene 9586] {aka CRE-BPA, CREB-5, CREBPA}, ABCA8 (ATP binding cassette subfamily A member 8) [NCBI Gene 10351], FKBP5 (FKBP prolyl isomerase 5) [NCBI Gene 2289] {aka AIG6, FKBP51, FKBP54, P54, PPIase, Ptg-10}, NID2 (nidogen 2) [NCBI Gene 22795] {aka NID-2}, NR3C1 (nuclear receptor subfamily 3 group C member 1) [NCBI Gene 2908] {aka GCCR, GCR, GCRST, GR, GRL}, ADPRHL1 (ADP-ribosylhydrolase like 1) [NCBI Gene 113622] {aka ARH2}, TGFB1 (transforming growth factor beta 1) [NCBI Gene 7040] {aka CAEND1, CED, DPD1, IBDIMDE, LAP, TGF-beta1}, PSG1 (pregnancy specific beta-1-glycoprotein 1) [NCBI Gene 5669] {aka B1G1, CD66f, FL-NCA-1/2, PBG1, PS-beta-C/D, PS-beta-G-1}, TLR4 (toll like receptor 4) [NCBI Gene 7099] {aka ARMD10, CD284, TLR-4, TOLL}, FAT3 (FAT atypical cadherin 3) [NCBI Gene 120114] {aka CDHF15, CDHR10, hFat3}, TSC22D3 (TSC22 domain family member 3) [NCBI Gene 1831] {aka DIP, DSIPI, GILZ, TSC-22R}, GPRC5A (G protein-coupled receptor class C group 5 member A) [NCBI Gene 9052] {aka GPCR5A, PEIG-1, RAI3, RAIG1, TIG1}, ROR1 (ROR family WNT receptor 1) [NCBI Gene 4919] {aka NTRKR1, dJ537F10.1}
- **Diseases:** PTSD (MESH:D013313), psychiatric disorder (MESH:D001523), gonorrhea (MESH:D006069), anxiety (MESH:D001007), trauma (MESH:D014947), chlamydia (MESH:D002690), DE (MESH:D001039), inflammatory (MESH:D007249)
- **Chemicals:** dexl (-), dex (MESH:D003907), hydrocortisone (MESH:D006854), alcohol (MESH:D000438), Mgs (MESH:D008274), cort (MESH:D003348)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** S — Homo sapiens (Human), Colorectal adenoma, Cancer cell line (CVCL_8754), S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232), S1 — Gallus gallus (Chicken), Chicken bursal lymphoma, Cancer cell line (CVCL_1T28)

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11253971/full.md

---
Source: https://tomesphere.com/paper/PMC11253971