Memorization in Deep Neural Networks: Does the Loss Function matter?
Deep Patel, P.S. Sastry

TL;DR
This paper investigates how the choice of loss function influences a deep neural network's ability to memorize random data, showing symmetric losses improve robustness against overfitting.
Contribution
It provides empirical evidence and a theoretical explanation that symmetric loss functions enhance resistance to memorization in deep neural networks.
Findings
Symmetric loss functions improve resistance to overfitting on MNIST and CIFAR-10.
Standard regularization techniques do not mitigate memorization.
Theoretical analysis explains why symmetric losses confer robustness.
Abstract
Deep Neural Networks, often owing to the overparameterization, are shown to be capable of exactly memorizing even randomly labelled data. Empirical studies have also shown that none of the standard regularization techniques mitigate such overfitting. We investigate whether the choice of the loss function can affect this memorization. We empirically show, with benchmark data sets MNIST and CIFAR-10, that a symmetric loss function, as opposed to either cross-entropy or squared error loss, results in significant improvement in the ability of the network to resist such overfitting. We then provide a formal definition for robustness to memorization and provide a theoretical explanation as to why the symmetric losses provide this robustness. Our results clearly bring out the role loss functions alone can play in this phenomenon of memorization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
