Hierarchical Data Generator based on Tree-Structured Stick Breaking Process for Benchmarking Clustering Methods
{\L}ukasz P. Olech, Micha{\l} Spytkowski, Halina Kwa\'snicka, Zbigniew, Michalewicz

TL;DR
This paper introduces a novel hierarchical data generator based on the Tree-Structured Stick Breaking process, designed to create synthetic datasets for benchmarking Object Cluster Hierarchy methods, supported by empirical and theoretical analysis.
Contribution
It presents a new data generator for hierarchical clustering benchmarking, with detailed analysis and guidance on parameter control, and provides publicly available datasets.
Findings
Generator produces diverse hierarchical structures
Empirical and theoretical validation of the generator
Datasets mirror common hierarchy types
Abstract
Object Cluster Hierarchies is a new variant of Hierarchical Cluster Analysis that gains interest in the field of Machine Learning. Being still at an early stage of development, the lack of tools for systematic analysis of Object Cluster Hierarchies inhibits its further improvement. In this paper we address this issue by proposing a generator of synthetic hierarchical data that can be used for benchmarking Object Cluster Hierarchy methods. The article presents a thorough empirical and theoretical analysis of the generator and provides guidance on how to control its parameters. Conducted experiments show the usefulness of the data generator that is capable of producing a wide range of differently structured data. Further, benchmarking datasets that mirror the most common types of hierarchies are generated and made available to the public, together with the developed generator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
