Understanding Deep Neural Networks via Linear Separability of Hidden   Layers

Chao Zhang; Xinyu Chen; Wensheng Li; Lixue Liu; Wei Wu; Dacheng Tao

arXiv:2307.13962·cs.LG·July 27, 2023·1 cites

Understanding Deep Neural Networks via Linear Separability of Hidden Layers

Chao Zhang, Xinyu Chen, Wensheng Li, Lixue Liu, Wei Wu, Dacheng Tao

PDF

Open Access

TL;DR

This paper investigates the linear separability of hidden layers in deep neural networks using Minkowski difference measures, revealing its correlation with training performance and effects of network architecture and activation functions.

Contribution

It introduces Minkowski difference-based measures for evaluating linear separability and links this measure to training success across various network architectures.

Findings

01

Linear separability correlates with training performance.

02

Activation functions and network size affect separability.

03

Numerical experiments validate the measures across multiple architectures.

Abstract

In this paper, we measure the linear separability of hidden layer outputs to study the characteristics of deep neural networks. In particular, we first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets. Then, we demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance, i.e., if the updated weights can enhance the linear separability degree of hidden layer outputs, the updated network will achieve a better training performance, and vice versa. Moreover, we study the effect of activation function and network size (including width and depth) on the linear separability of hidden layers. Finally, we conduct the numerical experiments to validate our findings on some popular deep networks including multilayer perceptron…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Industrial Vision Systems and Defect Detection · Face and Expression Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · 1x1 Convolution · Inception Module · Kaiming Initialization · Residual Connection · Global Average Pooling · Auxiliary Classifier · Local Response Normalization