Oversampling Divide-and-conquer for Response-skewed Kernel Ridge   Regression

Jingyi Zhang; Xiaoxiao Sun

arXiv:2107.05834·stat.ML·November 11, 2021

Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression

Jingyi Zhang, Xiaoxiao Sun

PDF

Open Access

TL;DR

This paper introduces a novel response-adaptive partition and oversampling strategy for divide-and-conquer kernel ridge regression to effectively handle highly skewed response variables, improving estimation accuracy.

Contribution

It proposes a new algorithm combining adaptive partitioning and oversampling for skewed responses in dacKRR, with theoretical guarantees and practical guidance.

Findings

01

Smaller risk than classical dacKRR under mild conditions

02

Effective oversampling improves estimation in skewed response scenarios

03

Validated by simulations and real-data analyses

Abstract

The divide-and-conquer method has been widely used for estimating large-scale kernel ridge regression estimates. Unfortunately, when the response variable is highly skewed, the divide-and-conquer kernel ridge regression (dacKRR) may overlook the underrepresented region and result in unacceptable results. We combine a novel response-adaptive partition strategy with the oversampling technique synergistically to overcome the limitation. Through the proposed novel algorithm, we allocate some carefully identified informative observations to multiple nodes (local processors). Although the oversampling technique has been widely used for addressing discrete label skewness, extending it to the dacKRR setting is nontrivial. We provide both theoretical and practical guidance on how to effectively over-sample the observations under the dacKRR setting. Furthermore, we show the proposed estimate has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Imbalanced Data Classification Techniques