Flattening a Hierarchical Clustering through Active Learning
Fabio Vitale, Anand Rajagopalan, Claudio Gentile

TL;DR
This paper explores active learning for hierarchical clustering, providing theoretical bounds and efficient algorithms for reconstructing tree structures with practical experiments demonstrating superior performance.
Contribution
It offers a complete characterization of query complexity in the realizable case and introduces linear-time algorithms with theoretical guarantees for active learning in hierarchical clustering.
Findings
Algorithms achieve near-optimal query complexity
Linear-time implementations are feasible and effective
Preliminary experiments show improved performance over baselines
Abstract
We investigate active learning by pairwise similarity over the leaves of trees originating from hierarchical clustering procedures. In the realizable setting, we provide a full characterization of the number of queries needed to achieve perfect reconstruction of the tree cut. In the non-realizable setting, we rely on known important-sampling procedures to obtain regret and query complexity bounds. Our algorithms come with theoretical guarantees on the statistical error and, more importantly, lend themselves to linear-time implementations in the relevant parameters of the problem. We discuss such implementations, prove running time guarantees for them, and present preliminary experiments on real-world datasets showing the compelling practical performance of our algorithms as compared to both passive learning and simple active learning baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Data Quality and Management · Stochastic Gradient Optimization Techniques
