# ID-GBA: Subgraph Extension With Information Distance Guilt by Association in Complex Networks

**Authors:** PREDRAG OBRADOVIC, VLADIMIR KOVACEVIC, ALEKSANDAR MILOSAVLJEVIC, VARDUHI PETROSYAN

PMC · DOI: 10.1109/access.2025.3622038 · IEEE access : practical innovations, open solutions · 2026-03-10

## TL;DR

This paper introduces ID-GBA, a new method for expanding disease-related gene clusters in complex networks, which outperforms existing tools in identifying known disease genes.

## Contribution

The novel ID-GBA algorithm uses information distance and guilt-by-association with an automated thresholding mechanism for subgraph extension.

## Key findings

- ID-GBA outperforms Random Walk with Restarts and Personalized PageRank in capturing known disease genes with higher NDCG scores.
- The method successfully expands disease clusters and recaptures known disease genes in nine disease/control gene expression networks.
- ID-GBA includes an automated, data-driven thresholding mechanism, eliminating the need for user-specified parameters.

## Abstract

Here, we introduce the ID-GBA (Information Distance Guilt By Association) method to expand highly connected sets of nodes by deploying a novel algorithm for subgraph extension based on the guilt-by-association principle and information distance. In this study, ID-GBA was utilized to expand disease clusters, and identify novel disease genes. We first validate its ability to expand related disease sets from disease/disease graphs built using Open Targets’ gene association scores. We then analyze disease/control gene expression networks and show that ID-GBA recaptures known disease genes in nine disease/control graphs. Compared to existing methods such as Random Walk with Restarts and Personalized PageRank, ID-GBA achieves significantly higher Normalized Discounted Cumulative Gain scores, which indicates superior predictive performance at capturing known disease genes. Additionally, unlike other approaches that require users to specify either a threshold parameter or a fixed number of nodes to include in the extended subgraph, ID-GBA includes a built-in, automated, and data-driven thresholding mechanism. These results establish ID-GBA as a novel open-source tool to uncover hidden relationships in gene/gene, disease/disease, and other complex networks.

## Full-text entities

- **Genes:** PIK3CB (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta) [NCBI Gene 5291] {aka P110BETA, PI3K, PI3KBETA, PIK3C1}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}
- **Diseases:** glioblastoma multiforme (MESH:D005909), chromophobe renal cell carcinoma (MESH:D002292), COMMUNITIES (MESH:D003147), Ulcerative Colitis (MESH:D003093), multiple sclerosis (MESH:D009103), GBA (MESH:D018886), Type 2 Diabetes (MESH:D003924), CONDUCTANCE EVALUATION (MESH:D000072861), Breast Carcinoma (MESH:D001943), Congenital heart disease (MESH:D006330), GENES (MESH:C537680), Dilated Cardiomyopathy (MESH:D002311), TRANSITION PROBABILITIES (MESH:D008579), colon cancer (MESH:D015179), Arthritis (MESH:D001168), Psoriasis (MESH:D011565), Chronic Obstructive Pulmonary Disease (MESH:D029424), Lung Adenocarcinoma (MESH:D000077192), lysosomal storage disease (MESH:D016464), Asthma (MESH:D001249), Alzheimer's disease (MESH:D000544), Cancer (MESH:D009369), DISTANCE (MESH:C535290), Parkinson's disease (MESH:D010300), ID (MESH:C537985), EXTENDS DISEASE (MESH:D004194)
- **Chemicals:** DSA08946 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12970960/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12970960/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12970960/full.md

---
Source: https://tomesphere.com/paper/PMC12970960