An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering
Kun Li, Liang Yuan, Yunquan Zhang, Gongwei Chen

TL;DR
This paper introduces a novel large-scale regression method using best friend clustering, which enhances accuracy and efficiency by leveraging hierarchical clustering and hybrid parallelism, suitable for high-performance computing environments.
Contribution
The paper presents a new data structure and clustering strategy that improve parallel regression performance, accuracy, and scalability without complex hyperparameter tuning.
Findings
Achieves faster convergence and higher accuracy in large-scale regression
Demonstrates scalability across multiple cores and distributed systems
Provides a simple yet effective hierarchical clustering framework
Abstract
As the data size in Machine Learning fields grows exponentially, it is inevitable to accelerate the computation by utilizing the ever-growing large number of available cores provided by high-performance computing hardware. However, existing parallel methods for clustering or regression often suffer from problems of low accuracy, slow convergence, and complex hyperparameter-tuning. Furthermore, the parallel efficiency is usually difficult to improve while striking a balance between preserving model properties and partitioning computing workloads on distributed systems. In this paper, we propose a novel and simple data structure capturing the most important information among data samples. It has several advantageous properties supporting a hierarchical clustering strategy that is irrelevant to the hardware parallelism, well-defined metrics for determining optimal clustering, balanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Machine Learning and Data Classification · Text and Document Classification Technologies
