TL;DR
This paper introduces Pairwise Margin Maximization (PMM), a novel regularization method for deep neural networks that improves generalization by focusing on the minimal displacement needed to change classification, outperforming traditional weight decay.
Contribution
The paper proposes PMM, a new regularization scheme tailored for deep networks, addressing limitations of the maximum margin principle in multi-class classification.
Findings
PMM leads to substantial performance improvements over standard regularization.
Implementing PMM in the deep feature space enhances training stability.
Empirical results show better generalization with PMM.
Abstract
The weight decay regularization term is widely used during training to constrain expressivity, avoid overfitting, and improve generalization. Historically, this concept was borrowed from the SVM maximum margin principle and extended to multi-class deep networks. Carefully inspecting this principle reveals that it is not optimal for multi-class classification in general, and in particular when using deep neural networks. In this paper, we explain why this commonly used principle is not optimal and propose a new regularization scheme, called {\em Pairwise Margin Maximization} (PMM), which measures the minimal amount of displacement an instance should take until its predicted classification is switched. In deep neural networks, PMM can be implemented in the vector space before the network's output layer, i.e., in the deep feature space, where we add an additional normalization term to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsWeight Decay · Support Vector Machine
