# Accuracy Evaluation of Overlapping and Multi-resolution Clustering   Algorithms on Large Datasets

**Authors:** Artem Lutov, Mourad Khayati, Philippe Cudr\'e-Mauroux

arXiv: 1902.01691 · 2019-02-18

## TL;DR

This paper evaluates the accuracy of overlapping and multi-resolution clustering algorithms on large datasets, proposing new metrics and optimizations to improve efficiency and effectiveness, with open-source implementations available.

## Contribution

It introduces a new indexing technique for faster accuracy metric computation and extends existing metrics to better satisfy formal constraints on large datasets.

## Key findings

- New indexing reduces runtime and memory usage
- Metrics are faster than state-of-the-art on large datasets
- Open-source C++ implementations available

## Abstract

Performance of clustering algorithms is evaluated with the help of accuracy metrics. There is a great diversity of clustering algorithms, which are key components of many data analysis and exploration systems. However, there exist only few metrics for the accuracy measurement of overlapping and multi-resolution clustering algorithms on large datasets. In this paper, we first discuss existing metrics, how they satisfy a set of formal constraints, and how they can be applied to specific cases. Then, we propose several optimizations and extensions of these metrics. More specifically, we introduce a new indexing technique to reduce both the runtime and the memory complexity of the Mean F1 score evaluation. Our technique can be applied on large datasets and it is faster on a single CPU than state-of-the-art implementations running on high-performance servers. In addition, we propose several extensions of the discussed metrics to improve their effectiveness and satisfaction to formal constraints without affecting their efficiency. All the metrics discussed in this paper are implemented in C++ and are available for free as open-source packages that can be used either as stand-alone tools or as part of a benchmarking system to compare various clustering algorithms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.01691/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1902.01691/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1902.01691/full.md

---
Source: https://tomesphere.com/paper/1902.01691