PoF: Post-Training of Feature Extractor for Improving Generalization
Ikuro Sato, Ryota Yamada, Masayuki Tanaka, Nakamasa Inoue, Rei, Kawakami

TL;DR
PoF is a post-training algorithm that fine-tunes the feature extractor of deep models to find flatter minima, thereby enhancing generalization performance with minimal additional training.
Contribution
It introduces a data-driven, perturbation-based post-training method to improve model generalization by targeting flatter minima in the feature extractor.
Findings
Improved accuracy on CIFAR-10 and CIFAR-100 datasets after 10 epochs.
Enhanced performance on SVHN dataset with 50 epochs of post-training.
Theoretical analysis confirms reduction in Hessian components and loss.
Abstract
It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Model Reduction and Neural Networks
MethodsTest
