Neural Network Memorization Dissection

Jindong Gu; Volker Tresp

arXiv:1911.09537·cs.LG·November 22, 2019·6 cites

Neural Network Memorization Dissection

Jindong Gu, Volker Tresp

PDF

Open Access

TL;DR

This paper investigates how deep neural networks memorize data, highlighting differences in learning patterns between true and random labels, and introduces methods to compare learned representations.

Contribution

It provides empirical analysis of DNN memorization and proposes a novel approach to measure similarity between learned representations across models.

Findings

01

DNNs prioritize simple input patterns during learning

02

DNNs trained on true vs. random labels exhibit different memorization behaviors

03

Gradient information helps interpret memorization and learning patterns

Abstract

Deep neural networks (DNNs) can easily fit a random labeling of the training data with zero training error. What is the difference between DNNs trained with random labels and the ones trained with true labels? Our paper answers this question with two contributions. First, we study the memorization properties of DNNs. Our empirical experiments shed light on how DNNs prioritize the learning of simple input patterns. In the second part, we propose to measure the similarity between what different DNNs have learned and memorized. With the proposed approach, we analyze and compare DNNs trained on data with true labels and random labels. The analysis shows that DNNs have \textit{One way to Learn} and \textit{N ways to Memorize}. We also use gradient information to gain an understanding of the analysis results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Music and Audio Processing