A Law of Data Separation in Deep Learning

Hangfeng He; Weijie J. Su

arXiv:2210.17020·cs.LG·August 14, 2023

A Law of Data Separation in Deep Learning

Hangfeng He, Weijie J. Su

PDF

Open Access 2 Repos

TL;DR

This paper uncovers a fundamental law describing how deep neural networks progressively separate data by class in each layer, providing insights for architecture design, robustness, and interpretability.

Contribution

It introduces a simple, quantitative law governing data separation in neural networks, validated across architectures and datasets, aiding future AI development.

Findings

01

Each layer improves data separation at a constant geometric rate.

02

The law emerges consistently during training across various architectures.

03

Practical guidelines for designing robust and interpretable models.

Abstract

While deep learning has enabled significant advances in many areas of science, its black-box nature hinders architecture design for future artificial intelligence applications and interpretation for high-stakes decision makings. We addressed this issue by studying the fundamental question of how deep neural networks process data in the intermediate layers. Our finding is a simple and quantitative law that governs how deep neural networks separate data according to class membership throughout all layers for classification. This law shows that each layer improves data separation at a constant geometric rate, and its emergence is observed in a collection of network architectures and datasets during training. This law offers practical guidelines for designing architectures, improving model robustness and out-of-sample performance, as well as interpreting the predictions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · 1x1 Convolution · Batch Normalization · Global Average Pooling · Kaiming Initialization · Max Pooling · Residual Connection · Bottleneck Residual Block · Residual Block