# ABDS: a bioinformatics tool suite for analyzing biologically diverse samples

**Authors:** Dongping Du, Saurabh Bhardwaj, Yingzhou Lu, Yizhi Wang, Sarah J. Parker, Zhen Zhang, Jennifer E. Van Eyk, Guoqiang Yu, Robert Clarke, David M. Herrington, Yue Wang

PMC · DOI: 10.21203/rs.3.rs-4419408/v1 · 2024-05-30

## TL;DR

ABDS is a new bioinformatics tool suite designed to better analyze diverse biological samples by improving missing data handling, gene detection, and visualization.

## Contribution

ABDS introduces a mechanism-integrated pre-imputation scheme, a cosine-based test for silenced genes, and a unified heatmap for multi-group visualization.

## Key findings

- ABDS improves detection of signature genes by preserving informative missingness.
- The cosine-based test effectively identifies group-silenced signature genes.
- Unified heatmap visualization enhances interpretation of multiple sample groups.

## Abstract

Bioinformatics software tools are essential to identify informative molecular features that define different phenotypic sample groups. Among the most fundamental and interrelated tasks are missing value imputation, signature gene detection, and differential pattern visualization. However, many commonly used analytics tools can be problematic when handling biologically diverse samples if either informative missingness possess high missing rates with mixed missing mechanisms, or multiple sample groups are compared and visualized in parallel. We developed the ABDS tool suite specifically for analyzing biologically diverse samples. Collectively, a mechanism-integrated group-wise pre-imputation scheme is proposed to retain informative missingness associated with signature genes, a cosine-based one-sample test is extended to detect group-silenced signature genes, and a unified heatmap is designed to display multiple sample groups. We describe the methodological principles and demonstrate the effectiveness of three analytics tools under targeted scenarios, supported by comparative evaluations and biomedical showcases. As an open-source R package, ABDS tool suite complements rather than replaces existing tools and will allow biologists to more accurately detect interpretable molecular signals among phenotypically diverse sample groups.

## Full-text entities

- **Genes:** ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, EREG (epiregulin) [NCBI Gene 2069] {aka EPR, ER, Ep}, IL2 (interleukin 2) [NCBI Gene 3558] {aka IL-2, TCGF, lymphokine}, STAT5A (signal transducer and activator of transcription 5A) [NCBI Gene 6776] {aka MGF, STAT5}, ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, DSG1 (desmoglein 1) [NCBI Gene 1828] {aka CDHF4, DG1, DSG, EPKHE, EPKHIA, PPKS1}
- **Diseases:** LLOD (MESH:D045745), tumors (MESH:D009369), SG (MESH:C537680), fatty streaks (MESH:D058226), atherogenesis (MESH:D050197), hypoxia (MESH:D000860), breast cancer (MESH:D001943), inflammation (MESH:D007249)
- **Chemicals:** NA (MESH:D012964), eCOT (-), Tamoxifen (MESH:D013629), ROS (MESH:D017382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11160903/full.md

---
Source: https://tomesphere.com/paper/PMC11160903