Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees
Haotian Ju, Dongyue Li, Hongyang R. Zhang

TL;DR
This paper introduces a Hessian-based distance measure for analyzing and improving the generalization of fine-tuned deep neural networks, especially under small datasets and noisy labels, supported by theoretical bounds and empirical results.
Contribution
It proposes a Hessian-based generalization measure for fine-tuning, provides theoretical bounds, and develops an algorithm to mitigate overfitting with noisy labels.
Findings
Hessian-based distance correlates with generalization gaps
Theoretical bounds for Hessian-based generalization
Algorithm improves performance on noisy label datasets
Abstract
We consider fine-tuning a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which has often been observed (e.g., when the target dataset is small or when the training labels are noisy). Existing generalization measures for deep networks depend on notions such as distance from the initialization (i.e., the pretrained network) of the fine-tuned model and noise stability properties of deep networks. This paper identifies a Hessian-based distance measure through PAC-Bayesian analysis, which is shown to correlate well with observed generalization gaps of fine-tuned models. Theoretically, we prove Hessian distance-based generalization bounds for fine-tuned models. We also describe an extended study of fine-tuning against label noise, where overfitting remains a critical problem. We present an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
