Data-driven confidence bands for distributed nonparametric regression
Valeriy Avanesov

TL;DR
This paper introduces a new data-driven method for constructing confidence bands in distributed nonparametric regression, providing valid uncertainty quantification and optimal bounds for large datasets.
Contribution
It proposes a novel, computationally efficient algorithm for uncertainty quantification in distributed nonparametric regression, with rigorous validity and minimax-optimal bounds.
Findings
Valid frequentist $L_2$-confidence bands are constructed.
The method is computationally efficient and suitable for large datasets.
A minimax-optimal high-probability bound for the estimator is established.
Abstract
Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist -confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
