# Library size-stabilized metacells construction enhances co-expression network analysis in single-cell data

**Authors:** Tianjiao Zhang, Haibin Zhu, Marcel Schulz, Marcel Schulz, Marcel Schulz, Marcel Schulz, Marcel Schulz

PMC · DOI: 10.1371/journal.pcbi.1013697 · PLOS Computational Biology · 2025-11-13

## TL;DR

LSMetacell improves co-expression network analysis in single-cell RNA data by stabilizing library sizes, reducing false correlations and revealing biologically meaningful interactions.

## Contribution

LSMetacell introduces a novel computational framework that stabilizes library sizes during metacell construction to correct compositional biases in co-expression analysis.

## Key findings

- LSMetacell reduces false-positive correlations caused by library size variance in single-cell RNA data.
- Applied to Alzheimer’s disease data, LSMetacell identified microglia-specific co-expression modules linked to immune dysregulation and neurodegeneration.
- LSMetacell outperforms conventional methods in preserving biological heterogeneity while mitigating technical noise.

## Abstract

Single-cell RNA sequencing (scRNA-seq) deciphers cell type-specific co-expression networks to resolve biological functions but remains constrained by data sparsity and compositional biases. Conventional metacells construction strategies mitigate sparsity by aggregating transcriptionally similar cells but often neglect systematic biases introduced by compositional data. This problem leads to spurious co-expression correlations and obscuring biologically meaningful interactions. Through mathematical modeling and simulations, we demonstrate that uncontrolled library size variance in traditional metacells inflates false-positive correlations and distorts co-expression networks. Here, we present LSMetacell (Library Size-stabilized Metacells), a computational framework that explicitly stabilizes library sizes across metacells to reduce compositional noise while preserving cellular heterogeneity. LSMetacell addresses this by stabilizing library sizes during metacells aggregation, thereby enhancing the accuracy of downstream analyses such as Weighted Gene Co-expression Network Analysis (WGCNA). Applied to a postmortem Alzheimer’s disease brain scRNA-seq dataset, LSMetacell revealed robust, cell type-specific co-expression modules enriched for disease-relevant pathways, outperforming the conventional metacells approach. Our work establishes a principled strategy for resolving compositional biases in scRNA-seq data, advancing the reliability of co-expression network inference in studying complex biological systems. This framework provides a generalizable solution for improving transcriptional analyses in single-cell studies.

Gene co-expression analysis is a widely used method to infer functional relationships between genes by measuring correlations in their normalized gene expression level. However, in this paper, through mathematical modeling and simulations, we demonstrate that these correlations are systematically skewed—particularly due to biases caused by variability in sequencing depth (library size). This issue distorts co-expression analysis results, inflating false correlations and masking true biological interactions. Traditional methods fail to address library size biases in single-cell studies where data sparsity compounds these challenges. We introduce LSMetacell, a computational framework that simultaneously tackles single-cell data sparsity and corrects for library size-induced correlation biases. By constructing metacells with stabilized sequencing depths, our method reduces technical noise while preserving biological heterogeneity. Applied to Alzheimer’s disease brain data, LSMetacell uncovered microglia-specific co-expression networks linking immune dysregulation to neurodegeneration. Our work provides a dual solution: enhancing single-cell resolution through cell aggregation and mitigating systemic biases that plague co-expression studies. LSMetacell integrates technical approaches with biological analysis, enabling researchers to extract precise and reproducible findings from compositional data.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975)

## Full-text entities

- **Diseases:** Alzheimer's disease (MESH:D000544)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12626273/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12626273/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12626273/full.md

---
Source: https://tomesphere.com/paper/PMC12626273