TL;DR
This paper introduces a novel approach to learning kernels from data by evaluating their ability to interpolate accurately with fewer points, enabling the training of very deep networks that can classify with minimal data and learn class archetypes.
Contribution
It proposes a data-driven method for constructing and selecting kernels based on interpolation stability, facilitating the training of deep kernel networks and revealing their properties.
Findings
Deep kernel networks can classify with one data point per class.
They learn class archetypes and expand inter-class distances.
The method outperforms traditional CNN training with dropout.
Abstract
Learning can be seen as approximating an unknown function by interpolating the training data. Kriging offers a solution to this problem based on the prior specification of a kernel. We explore a numerical approximation approach to kernel selection/construction based on the simple premise that a kernel must be good if the number of interpolation points can be halved without significant loss in accuracy (measured using the intrinsic RKHS norm associated with the kernel). We first test and motivate this idea on a simple problem of recovering the Green's function of an elliptic PDE (with inhomogeneous coefficients) from the sparse observation of one of its solutions. Next we consider the problem of learning non-parametric families of deep kernels of the form with and $G_{n+1} \in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
