# On the bias of H-scores for comparing biclusters, and how to correct it

**Authors:** Jacopo Di Iorio, Francesca Chiaromonte, Marzia A. Cremona

arXiv: 1907.11142 · 2020-07-08

## TL;DR

This paper identifies a bias in the H-score used for evaluating biclustering algorithms, showing it favors small clusters, and proposes a correction method to enable fair comparisons across biclusters of different sizes.

## Contribution

The authors analytically and empirically demonstrate the bias in the H-score and introduce a correction method to improve biclustering evaluation accuracy.

## Key findings

- H-score increases with bicluster size, causing bias.
- The bias leads to preference for smaller biclusters in evaluations.
- A correction method effectively mitigates the bias.

## Abstract

In the last two decades several biclustering methods have been developed as new unsupervised learning techniques to simultaneously cluster rows and columns of a data matrix. These algorithms play a central role in contemporary machine learning and in many applications, e.g. to computational biology and bioinformatics. The H-score is the evaluation score underlying the seminal biclustering algorithm by Cheng and Church, as well as many other subsequent biclustering methods. In this paper, we characterize a potentially troublesome bias in this score, that can distort biclustering results. We prove, both analytically and by simulation, that the average H-score increases with the number of rows/columns in a bicluster. This makes the H-score, and hence all algorithms based on it, biased towards small clusters. Based on our analytical proof, we are able to provide a straightforward way to correct this bias, allowing users to accurately compare biclusters.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.11142/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1907.11142/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/1907.11142/full.md

---
Source: https://tomesphere.com/paper/1907.11142