Adjusting the adjusted Rand Index -- A multinomial story
Martina Sundqvist, Julien Chiquet, Guillem Rigaill

TL;DR
This paper introduces a new multinomial-based adjusted Rand Index (MARI) that improves cluster comparison by addressing limitations of the traditional ARI, especially in dependent clustering scenarios, and provides an efficient computation method.
Contribution
The authors propose a modified Rand Index (MRI) and its adjusted version (MARI) based on a multinomial model, enhancing interpretability and modeling accuracy over the hypergeometric assumption.
Findings
MARI reduces bias present in ARI under the multinomial model.
Large sample sizes diminish differences between ARI and MARI.
Efficient linear-time algorithm implemented in the aricode package.
Abstract
The Adjusted Rand Index () is arguably one of the most popular measures for cluster comparison. The adjustment of the is based on a hypergeometric distribution assumption which is unsatisfying from a modeling perspective as (i) it is not appropriate when the two clusterings are dependent, (ii) it forces the size of the clusters, and (iii) it ignores randomness of the sampling. In this work, we present a new "modified" version of the Rand Index. First, we redefine the by only counting the pairs consistent by similarity and ignoring the pairs consistent by difference, increasing the interpretability of the score. Second, we base the adjusted version, , on a multinomial distribution instead of a hypergeometric distribution. The multinomial model is advantageous as it does not force the size of the clusters, properly models randomness, and is easily extended to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Complex Network Analysis Techniques
