Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs
David Eppstein

TL;DR
This paper introduces efficient data structures for dynamic closest pair problems applicable to arbitrary distances, enabling faster hierarchical clustering, matching, and TSP heuristics with practical improvements over previous methods.
Contribution
The paper presents novel data structures for dynamic closest pair problems that do not rely on geometric assumptions, improving update times and broadening application scope.
Findings
Faster practical algorithms for hierarchical clustering and TSP heuristics.
Efficient data structures with O(n log^2 n) update time and O(n) space.
Potential applications in machine learning, Groebner bases, and optimization.
Abstract
We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
