Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression
Jingyi Zhang, Xiaoxiao Sun

TL;DR
This paper introduces a novel response-adaptive partition and oversampling strategy for divide-and-conquer kernel ridge regression to effectively handle highly skewed response variables, improving estimation accuracy.
Contribution
It proposes a new algorithm combining adaptive partitioning and oversampling for skewed responses in dacKRR, with theoretical guarantees and practical guidance.
Findings
Smaller risk than classical dacKRR under mild conditions
Effective oversampling improves estimation in skewed response scenarios
Validated by simulations and real-data analyses
Abstract
The divide-and-conquer method has been widely used for estimating large-scale kernel ridge regression estimates. Unfortunately, when the response variable is highly skewed, the divide-and-conquer kernel ridge regression (dacKRR) may overlook the underrepresented region and result in unacceptable results. We combine a novel response-adaptive partition strategy with the oversampling technique synergistically to overcome the limitation. Through the proposed novel algorithm, we allocate some carefully identified informative observations to multiple nodes (local processors). Although the oversampling technique has been widely used for addressing discrete label skewness, extending it to the dacKRR setting is nontrivial. We provide both theoretical and practical guidance on how to effectively over-sample the observations under the dacKRR setting. Furthermore, we show the proposed estimate has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Imbalanced Data Classification Techniques
