Batch Normalization Is Blind to the First and Second Derivatives of the   Loss

Zhanpeng Zhou; Wen Shen; Huixin Chen; Ling Tang; Quanshi Zhang

arXiv:2205.15146·cs.LG·June 3, 2022

Batch Normalization Is Blind to the First and Second Derivatives of the Loss

Zhanpeng Zhou, Wen Shen, Huixin Chen, Ling Tang, Quanshi Zhang

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that Batch Normalization (BN) impairs the back-propagation of the first and second derivatives of the loss, affecting training dynamics and feature representations, especially in tasks with similar sample losses.

Contribution

It provides a theoretical analysis showing BN blocks the influence of derivatives of the loss and verifies this with experimental evidence.

Findings

01

BN affects the influence of first and second derivatives of the loss

02

Standardization phase causes the derivative blocking effect

03

BN significantly impacts feature representations in certain tasks

Abstract

In this paper, we prove the effects of the BN operation on the back-propagation of the first and second derivatives of the loss. When we do the Taylor series expansion of the loss function, we prove that the BN operation will block the influence of the first-order term and most influence of the second-order term of the loss. We also find that such a problem is caused by the standardization phase of the BN operation. Experimental results have verified our theoretical conclusions, and we have found that the BN operation significantly affects feature representations in specific tasks, where losses of different samples share similar analytic formulas.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Batch Normalization Is Blind to the First and Second Derivatives of the Loss· underline

Taxonomy

TopicsNeural Networks and Applications · Face and Expression Recognition · Image and Signal Denoising Methods