A comparison of different clustering approaches for high-dimensional   presence-absence data

Gabriele d'Angella; Christian Hennig

arXiv:2108.09243·stat.ME·November 24, 2021

A comparison of different clustering approaches for high-dimensional presence-absence data

Gabriele d'Angella, Christian Hennig

PDF

Open Access

TL;DR

This paper compares various clustering methods for high-dimensional presence-absence data, evaluating their performance through extensive simulations based on species distribution models.

Contribution

It provides a comprehensive comparison of latent class, hierarchical, and multidimensional scaling clustering approaches for presence-absence data.

Findings

01

Latent class clustering performs well with certain data structures.

02

Distance-based methods are computationally efficient.

03

Multidimensional scaling approaches offer a useful alternative.

Abstract

Presence-absence data is defined by vectors or matrices of zeroes and ones, where the ones usually indicate a "presence" in a certain place. Presence-absence data occur for example when investigating geographical species distributions, genetic information, or the occurrence of certain terms in texts. There are many applications for clustering such data; one example is to find so-called biotic elements, i.e., groups of species that tend to occur together geographically. Presence-absence data can be clustered in various ways, namely using a latent class mixture approach with local independence, distance-based hierarchical clustering with the Jaccard distance, or also using clustering methods for continuous data on a multidimensional scaling representation of the distances. These methods are conceptually very different and can therefore not easily be compared theoretically. We compare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research