Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry

Naoki Yoshida; Isao Ishikawa; Masaaki Imaizumi

arXiv:2512.06347·cs.LG·December 9, 2025

Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry

Naoki Yoshida, Isao Ishikawa, Masaaki Imaizumi

PDF

Open Access 3 Reviews

TL;DR

This paper proves that in a teacher-student setting, the generalization error of random interpolators becomes zero once training samples surpass a geometric threshold, using algebraic geometry tools.

Contribution

It provides a theoretical proof that random interpolators can achieve zero generalization error based on geometric properties, advancing understanding of model generalization.

Findings

01

Generalization error becomes zero beyond a data threshold

02

Algebraic geometry characterizes interpolator geometry

03

Supports empirical observations of effective random interpolators

Abstract

We theoretically demonstrate that the generalization error of interpolators for machine learning models under teacher-student settings becomes 0 once the number of training samples exceeds a certain threshold. Understanding the high generalization ability of large-scale models such as deep neural networks (DNNs) remains one of the central open problems in machine learning theory. While recent theoretical studies have attributed this phenomenon to the implicit bias of stochastic gradient descent (SGD) toward well-generalizing solutions, empirical evidences indicate that it primarily stems from properties of the model itself. Specifically, even randomly sampled interpolators, which are parameters that achieve zero training error, have been observed to generalize effectively. In this study, under a teacher-student framework, we prove that the generalization error of randomly sampled…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

The problem formulation addresses generalization - one of the cornerstones of modern machine learning. The approach is novel and interesting.

Weaknesses

The resulting bounds appear to be vacuous for the overparameterized models, even though analyzing the latter serves as the motivation in the introductions. To corroborate further, consider the simplest case where both the teacher and the student networks match each other and are given by $f(w, x) = wTx$, where $x, w \in \mathbb{R}^d$. Denote the frozen weight parameter of the teacher network by $w_*$. Take $x$ to be i.i.d. standard normal. Then, first of all, the only zero generalization error w

Reviewer 02Rating 8Confidence 3

Strengths

- The authors propose a model-based analysis of the generalization error of interpolators in a teacher-student setting. This offers an interesting perspective on generalization, showing that perfect generalisation error can be reacher when there is structure in the data distribution and the model is compatible with it (in the sense that it can interpolate). - The paper uses tools from algebraic geometry, suggesting new links between this field and statistical learning theory - The literature rev

Weaknesses

*Main weaknesses:* - The introduction put an emphasis on the models that "employ an excessive number of parameters". However, the proposed theory states that the strong sample complexity is bounded by the number parameters of the student network. This bound seems to suggest that not too much overparameterization is allowed in order to obtain zero generalization error. Even in Theorem 6, the derived sample complexity seems to be $k = O(\sqrt{d_\Theta})$, which allows some but not arbitrary overp

Reviewer 03Rating 6Confidence 3

Strengths

- **Clear and Well-Written Presentation:** The paper is generally well-written, and the results are clean. - **Innovative Theoretical Contribution:** The paper rigorously analyzes the generalization properties of interpolators using tools from algebraic geometry. The derived results are both elegant and insightful. This paper is solid. - **Empirical Validation:** Experimental results on synthetic regression tasks and the MNIST dataset back up the theoretical findings, showing that the predicted

Weaknesses

- **Strong & Restrictive Assumptions:** The analysis is carried out in a controlled, noiseless teacher–student setting, and the student model is assumed to be real analytic. In practical scenarios, these assumptions may not hold, e.g., the teacher model is a ReLU MLP. - **Limited Applicability to General Machine Learning Settings:** The theoretical results depend on the assumption that the teacher and student models belong to the same parametric function class—more precisely, Assumption 2 requir

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Graph Neural Networks · Gaussian Processes and Bayesian Inference