Flattening a Hierarchical Clustering through Active Learning

Fabio Vitale; Anand Rajagopalan; Claudio Gentile

arXiv:1906.09458·cs.LG·October 15, 2019·1 cites

Flattening a Hierarchical Clustering through Active Learning

Fabio Vitale, Anand Rajagopalan, Claudio Gentile

PDF

Open Access

TL;DR

This paper explores active learning for hierarchical clustering, providing theoretical bounds and efficient algorithms for reconstructing tree structures with practical experiments demonstrating superior performance.

Contribution

It offers a complete characterization of query complexity in the realizable case and introduces linear-time algorithms with theoretical guarantees for active learning in hierarchical clustering.

Findings

01

Algorithms achieve near-optimal query complexity

02

Linear-time implementations are feasible and effective

03

Preliminary experiments show improved performance over baselines

Abstract

We investigate active learning by pairwise similarity over the leaves of trees originating from hierarchical clustering procedures. In the realizable setting, we provide a full characterization of the number of queries needed to achieve perfect reconstruction of the tree cut. In the non-realizable setting, we rely on known important-sampling procedures to obtain regret and query complexity bounds. Our algorithms come with theoretical guarantees on the statistical error and, more importantly, lend themselves to linear-time implementations in the relevant parameters of the problem. We discuss such implementations, prove running time guarantees for them, and present preliminary experiments on real-world datasets showing the compelling practical performance of our algorithms as compared to both passive learning and simple active learning baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Data Quality and Management · Stochastic Gradient Optimization Techniques