# Topology-based metrics for finding the optimal sparsity in gene regulatory network inference

**Authors:** Nils Lundqvist, Mateusz Garbulowski, Thomas Hillerton, Erik L L Sonnhammer

PMC · DOI: 10.1093/bioinformatics/btaf120 · 2025-03-24

## TL;DR

This paper introduces new methods to determine the best sparsity level for gene regulatory networks, improving their accuracy in real-world applications.

## Contribution

The paper proposes and evaluates two topology-based methods for predicting optimal sparsity in GRN inference.

## Key findings

- The new topology-based methods reliably predict sparsity close to the true sparsity in simulated data.
- These methods outperform arbitrary hyperparameter settings for sparsity control in GRN inference.
- The results suggest that the scale-free topology assumption is useful for determining optimal GRN sparsity.

## Abstract

Gene regulatory network (GRN) inference is a complex task aiming to unravel regulatory interactions between genes in a cell. A major shortcoming of most GRN inference methods is that they do not attempt to find the optimal sparsity, i.e. the single best GRN, which is important when applying GRN inference in a real situation. Instead, the sparsity tends to be controlled by an arbitrarily set hyperparameter.

In this paper, two new methods for predicting the optimal sparsity of GRNs are formulated and benchmarked on simulated perturbation-based gene expression data using four GRN inference methods: LASSO, Zscore, LSCON, and GENIE3. Both sparsity prediction methods are defined using the hypothesis that the topology of real GRNs is scale-free, and are evaluated based on their ability to predict the sparsity of the true GRN. The results show that the new topology-based approaches reliably predict a sparsity close to the true one. This ability is valuable for real-world applications where a single GRN is inferred from real data. In such situations, it is vital to be able to infer a GRN with the correct sparsity.

https://bitbucket.org/sonnhammergrni/powerlaw_sparsity/ and https://codeocean.com/capsule/4393635/.

## Full-text entities

- **Genes:** SFTPA1 (surfactant protein A1) [NCBI Gene 653509] {aka COLEC4, ILD1, PSP-A, PSPA, SFTP1, SFTPA1B}, GRN (granulin precursor) [NCBI Gene 2896] {aka CLN11, FTD2, GEP, GP88, PCDGF, PEPI}
- **Diseases:** COVID-19 (MESH:D000086382), cancer (MESH:D009369)
- **Species:** Escherichia coli (E. coli, species) [taxon 562], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** K562 — Homo sapiens (Human), Blast phase chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_0004), HepG2 — Homo sapiens (Human), Hepatoblastoma, Cancer cell line (CVCL_0027)

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12057811/full.md

---
Source: https://tomesphere.com/paper/PMC12057811