# HiCMC: High-Efficiency Contact Matrix Compressor

**Authors:** Yeremia Gunawan Adhisantoso, Tim Körner, Fabian Müntefering, Jörn Ostermann, Jan Voges

PMC · DOI: 10.1186/s12859-024-05907-2 · BMC Bioinformatics · 2024-09-10

## TL;DR

HiCMC is a new method for compressing Hi-C data that achieves better efficiency than existing tools by leveraging chromosome structure patterns.

## Contribution

HiCMC introduces a novel compression approach for Hi-C data by modeling genomic structures like compartments and domains.

## Key findings

- HiCMC outperforms CMC by about 8% and cooler, LZMA, and bzip2 by over 50% in compression efficiency.
- HiCMC integrates domain-specific information into compressed data to accelerate downstream analyses.
- The method works effectively across multiple cell lines and contact matrix resolutions.

## Abstract

Chromosome organization plays an important role in biological processes such as replication, regulation, and transcription. One way to study the relationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation. Such studies generate vast amounts of data. The problem is exacerbated by the fact that chromosome organization is dynamic, requiring snapshots at different points in time, further increasing the amount of data to be stored. We present a novel approach called the High-Efficiency Contact Matrix Compressor (HiCMC) for efficient compression of Hi-C data.

By modeling the underlying structures found in the contact matrix, such as compartments and domains, HiCMC outperforms the state-of-the-art method CMC by approximately 8% and the other state-of-the-art methods cooler, LZMA, and bzip2 by over 50% across multiple cell lines and contact matrix resolutions. In addition, HiCMC integrates domain-specific information into the compressed bitstreams that it generates, and this information can be used to speed up downstream analyses.

HiCMC is a novel compression approach that utilizes intrinsic properties of contact matrix, such as compartments and domains. It allows for a better compression in comparison to the state-of-the-art methods. HiCMC is available at https://github.com/sXperfect/hicmc.

## Full-text entities

- **Diseases:** CRAM 3.1 (MESH:C537153), CMC (MESH:D003877), myelogenous leukemia (MESH:D007951)
- **Chemicals:** CMC (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** K562 — Homo sapiens (Human), Blast phase chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_0004), GM12878 — Homo sapiens (Human), Transformed cell line (CVCL_7526), NHEK — Homo sapiens (Human), Finite cell line (CVCL_9Q50), CH12 — Mus musculus (Mouse), Mouse lymphoma, Cancer cell line (CVCL_6818), IMR90 — Homo sapiens (Human), Finite cell line (CVCL_0347), HMEC — Homo sapiens (Human), Transformed cell line (CVCL_0307), KBM7 — Homo sapiens (Human), Chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_A426), HUVEC — Homo sapiens (Human), Finite cell line (CVCL_2959)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11389233/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11389233/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC11389233/full.md

---
Source: https://tomesphere.com/paper/PMC11389233