ThetA -- fast and robust clustering via a distance parameter
Eleftherios Garyfallidis, Shreyas Fadnavis, Jong Sung Park, Bramsh, Qamar Chandio, Javier Guaje, Serge Koudoro, Nasim Anousheh

TL;DR
ThetA introduces a novel distance threshold-based clustering method that improves accuracy and efficiency, especially in high-dimensional data, by simplifying the threshold learning process compared to traditional K-based methods.
Contribution
ThetA presents a new set of distance threshold algorithms that outperform existing clustering methods in accuracy and speed, with easier threshold learning and dataset sparsity inference.
Findings
Outperforms existing methods in accuracy and time complexity
Simplifies threshold learning compared to learning the number of clusters
Infers dataset sparsity in high dimensions
Abstract
Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades. This set of problems is often tackled by partitioning the data into K clusters where the number of clusters is chosen apriori. While significant progress has been made on these lines over the years, it is well established that as the number of clusters or dimensions increase, current approaches dwell in local minima resulting in suboptimal solutions. In this work, we propose a new set of distance threshold methods called Theta-based Algorithms (ThetA). Via experimental comparisons and complexity analyses we show that our proposed approach outperforms existing approaches in: a) clustering accuracy and b) time complexity. Additionally, we show that for a large class of problems, learning the optimal threshold is straightforward in comparison to learning K.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research · Image Retrieval and Classification Techniques
