# Adjacency-constrained hierarchical clustering of a band similarity   matrix with application to Genomics

**Authors:** Christophe Ambroise (LaMME), Alia Dehman, Pierre Neuvial (IMT),, Guillem Rigaill (IPS2, LaMME), Nathalie Vialaneix (MIAT INRA)

arXiv: 1902.01596 · 2019-02-06

## TL;DR

This paper introduces an efficient adjacency-constrained hierarchical clustering method for genomic similarity matrices, enabling rapid and memory-efficient analysis of large-scale genomic data with meaningful biological insights.

## Contribution

It presents a quasi-linear complexity implementation of adjacency-constrained HAC, suitable for high-resolution genomic data, with practical applications demonstrated on GWAS and Hi-C datasets.

## Key findings

- Method highlights biologically meaningful signals
- Runs on standard laptops in minutes or seconds
- Assumes negligible similarity between distant objects

## Abstract

Motivation: Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. A major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of 10^4 to 10^5 for each chromosome. Results: By assuming that the similarity between physically distant objects is negligible, we propose an implementation of this adjacency-constrained HAC with quasi-linear complexity. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds. Availability and Implementation: Software and sample data are available as an R package, adjclust, that can be downloaded from the Comprehensive R Archive Network (CRAN).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.01596/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1902.01596/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1902.01596/full.md

---
Source: https://tomesphere.com/paper/1902.01596