# Topology entropy: Enhancing graph partitioning for TAD identification and single-cell clustering

**Authors:** Qiushi Liang, Shengjie Zhao, Lingxi Chen, Shuai Cheng Li

PMC · DOI: 10.1016/j.csbj.2025.04.037 · Computational and Structural Biotechnology Journal · 2025-04-30

## TL;DR

This paper introduces a new method using topology entropy to improve the analysis of biological graphs, such as identifying TADs and clustering single-cell data.

## Contribution

The paper introduces the topology entropy encoding tree and two novel methods, TEC-O and TEC-U, for graph partitioning in biological data.

## Key findings

- Topology entropy is robust to noise and captures structural information better than existing methods.
- TEC-O and TEC-U achieve the highest accuracy in TAD detection and cell clustering, respectively.
- The methods provide biologically meaningful insights from Hi-C and single-cell sequencing data.

## Abstract

Entropy quantifies the limits of information compression and provides a theoretical foundation for exploring complex structures in large-scale graphs. However, effective metrics are needed to capture the intricate structural details in biological graphs. In this paper, we introduce the topology entropy encoding tree to quantify the complexity of biological graphs and show that minimizing the associated entropy is equivalent to optimal graph partitioning. We develop two methods, TEC-O and TEC-U, for partitioning ordered and unordered biological graphs. TEC-O is applied to identify Topologically Associated Domains (TADs) in Hi-C contact maps, while TEC-U is used for cell clustering in single-cell sequencing data. Results from simulated datasets demonstrate that topology entropy is robust to noise and effectively captures structural information, outperforming existing methods. Experiments on Hi-C data from five cell lines and ten single-cell sequencing datasets show that TEC-O and TEC-U achieve the highest accuracy in TAD detection and cell clustering, respectively, providing biologically meaningful insights.

## Full-text entities

- **Genes:** BCL2A1 (BCL2 related protein A1) [NCBI Gene 597] {aka ACC-1, ACC-2, ACC1, ACC2, BCL2L5, BFL1}, GPHA2 (glycoprotein hormone subunit alpha 2) [NCBI Gene 170589] {aka A2, GPA2, ZSIG51}, CTCF (CCCTC-binding factor) [NCBI Gene 10664] {aka CFAP108, FAP108, MRD21}, SMC3 (structural maintenance of chromosomes 3) [NCBI Gene 9126] {aka BAM, BMH, CDLS3, CSPG6, HCAP, SMC3L1}, RAD21 (RAD21 cohesin complex component) [NCBI Gene 5885] {aka CDLS4, HR21, HRAD21, MCD1, MGS, NXP1}
- **Diseases:** breast cancer (MESH:D001943), developmental disorders (MESH:D002658), TEC-U (MESH:C536925), cancers (MESH:D009369), TEC (MESH:D019292)
- **Chemicals:** DP (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** TEC-O — Mus musculus (Mouse), Hybridoma (CVCL_L845), NHEK — Homo sapiens (Human), Finite cell line (CVCL_9Q50), HMEC — Homo sapiens (Human), Transformed cell line (CVCL_0307), K562 — Homo sapiens (Human), Blast phase chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_0004), MCF-7 — Homo sapiens (Human), Invasive breast carcinoma of no special type, Cancer cell line (CVCL_0031), IMR90 — Homo sapiens (Human), Finite cell line (CVCL_0347), TEC-U — Homo sapiens (Human), Primary effusion lymphoma, Cancer cell line (CVCL_A5ZN), GM12878 — Homo sapiens (Human), Transformed cell line (CVCL_7526), MDA-MB-468 — Homo sapiens (Human), Breast adenocarcinoma, Cancer cell line (CVCL_0419)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12141873/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12141873/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12141873/full.md

---
Source: https://tomesphere.com/paper/PMC12141873