The Optimality of Kernel Classifiers in Sobolev Space
Jianfa Lai, Zhifan Li, Dongming Huang, Qian Lin

TL;DR
This paper analyzes the statistical performance of kernel classifiers in Sobolev spaces, establishing their optimality and extending insights to neural networks, with practical methods for estimating smoothness from data.
Contribution
It provides the first minimax lower bounds for kernel classifiers in Sobolev spaces and demonstrates their optimality, also extending results to neural network classifiers.
Findings
Derived an upper bound on classification excess risk.
Established minimax lower bounds showing optimality.
Proposed a practical method to estimate smoothness from data.
Abstract
Kernel methods are widely used in machine learning, especially for classification problems. However, the theoretical analysis of kernel classification is still limited. This paper investigates the statistical performances of kernel classifiers. With some mild assumptions on the conditional probability , we derive an upper bound on the classification excess risk of a kernel classifier using recent advances in the theory of kernel regression. We also obtain a minimax lower bound for Sobolev spaces, which shows the optimality of the proposed classifier. Our theoretical results can be extended to the generalization error of overparameterized neural network classifiers. To make our theoretical results more applicable in realistic settings, we also propose a simple method to estimate the interpolation smoothness of and apply the method to real…
Peer Reviews
Decision·ICLR 2024 poster
The paper is well written and easy to follow. The main results seem novel for the setup considered. The association of the estimated smoothness to the difficulty of the datasets is interesting.
see questions.
* Studying the optimality of kernel classifiers is a fundamental problem in machine learning. * The theory in this paper has the potential to guide the practice of kernel learning and neural networks.
The primary concern regarding this paper is that the established minimax optimality for kernel classifiers relies on the gradient flow algorithm, which is mainly based on the L2 loss and is not commonly used in practical applications of building kernel classifiers. While the minimax rate is established, its optimality is only proven in an asymptotic sense, leaving a considerable gap between theory and practical usage. This approach is natural in the existing work when building the minimax rate f
The paper has a relatively clear message, and I think the paper is relatively well-written. I guess that the upper bound on the convergence rate in Theorem 2 should be a relatively direct corollary from Fischer and Steinwart (2020) in the special case of ridge regularization, but the authors formulate a more general result for spectral algorithms that also holds for gradient flow. The lower bound in Theorem 1 nicely complements the upper bound. While the square loss is not very common for classi
The application to the NTK looks nice, but I have some concerns that Assumption 3 cannot be verified for a good $\alpha_0$, see the major comments below. While the smoothness assumption on $f^*$ seems less natural to me in the classification setting, the authors provide interesting experiments that appear to show that this assumption is appropriate. However, I have doubts that the proposed (and not theoretically justified) estimator can really estimate what it is supposed to estimate, see the m
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Research in Science and Engineering · Image and Signal Denoising Methods · Numerical methods in inverse problems
