An Accurate and Efficient Large-scale Regression Method through Best   Friend Clustering

Kun Li; Liang Yuan; Yunquan Zhang; Gongwei Chen

arXiv:2104.10819·cs.LG·April 23, 2021

An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering

Kun Li, Liang Yuan, Yunquan Zhang, Gongwei Chen

PDF

Open Access

TL;DR

This paper introduces a novel large-scale regression method using best friend clustering, which enhances accuracy and efficiency by leveraging hierarchical clustering and hybrid parallelism, suitable for high-performance computing environments.

Contribution

The paper presents a new data structure and clustering strategy that improve parallel regression performance, accuracy, and scalability without complex hyperparameter tuning.

Findings

01

Achieves faster convergence and higher accuracy in large-scale regression

02

Demonstrates scalability across multiple cores and distributed systems

03

Provides a simple yet effective hierarchical clustering framework

Abstract

As the data size in Machine Learning fields grows exponentially, it is inevitable to accelerate the computation by utilizing the ever-growing large number of available cores provided by high-performance computing hardware. However, existing parallel methods for clustering or regression often suffer from problems of low accuracy, slow convergence, and complex hyperparameter-tuning. Furthermore, the parallel efficiency is usually difficult to improve while striking a balance between preserving model properties and partitioning computing workloads on distributed systems. In this paper, we propose a novel and simple data structure capturing the most important information among data samples. It has several advantageous properties supporting a hierarchical clustering strategy that is irrelevant to the hardware parallelism, well-defined metrics for determining optimal clustering, balanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Machine Learning and Data Classification · Text and Document Classification Technologies