On the Feasibility of Distributed Kernel Regression for Big Data
Chen Xu, Yongquan Zhang, Runze Li

TL;DR
This paper investigates the theoretical validity of distributed kernel regression for big data, demonstrating that with proper segmentation, the method achieves consistent generalization performance, supported by simulations and real data.
Contribution
It provides the first theoretical analysis confirming the consistency of distributed kernel regression under big data conditions.
Findings
Distributed kernel regression is generalization consistent with proper segmentation.
Theoretical bounds are established for the generalization error.
Simulation and real data examples support the method's effectiveness.
Abstract
In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a strategy, a full dataset is first split into several manageable segments; the final output is then averaged from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown that whether such a distributive strategy provides valid theoretical inferences to the original data. In this paper, we address this fundamental issue for the distributed kernel regression (DKR), where the algorithmic feasibility is measured by the generalization performance of the resulting estimator. To justify DKR, a uniform convergence rate is needed for bounding the generalization error over the individual outputs, which brings new and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications
