Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime
Hiroki Naganuma, Taiji Suzuki, Rio Yokota, Masahiro Nomura, Kohta Ishikawa, Ikuro Sato

TL;DR
This paper investigates the effectiveness of Takeuchi's information criterion (TIC) as a measure of generalization in deep neural networks, especially near the neural tangent kernel (NTK) regime, through theoretical analysis and extensive experiments.
Contribution
It establishes theoretical conditions under which TIC explains DNN generalization and empirically validates TIC's correlation with generalization gaps near the NTK regime.
Findings
TIC correlates well with generalization gaps near NTK regime
Outside NTK regime, TIC's correlation with generalization gap diminishes
TIC improves trial pruning for hyperparameter optimization
Abstract
Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishing a reliable generalization measure for statistically singular models such as deep neural networks (DNNs) is difficult due to their complex nature. This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of DNNs. Importantly, the developed theory indicates the applicability of TIC near the neural tangent kernel (NTK) regime. In a series of experiments, we trained more than 5,000 DNN models with 12 architectures, including large models (e.g., VGG-16), on four datasets, and estimated the corresponding TIC values to examine the relationship between the generalization gap and the TIC estimates. We applied several TIC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Stochastic Gradient Optimization Techniques · Advanced Statistical Modeling Techniques
