Accelerating Training of Batch Normalization: A Manifold Perspective
Mingyang Yi

TL;DR
This paper introduces a manifold-based approach to optimize batch normalization networks, ensuring convergence to equivalent optima and accelerating training by leveraging the PSI manifold structure.
Contribution
It proposes a quotient manifold framework for BN networks, enabling gradient methods to converge efficiently to equivalent optima and improve training speed.
Findings
Accelerates training compared to Euclidean space methods.
Guarantees convergence to equivalent optima on the PSI manifold.
Improves generalization ability across various experiments.
Abstract
Batch normalization (BN) has become a critical component across diverse deep neural networks. The network with BN is invariant to positively linear re-scale transformation, which makes there exist infinite functionally equivalent networks with different scales of weights. However, optimizing these equivalent networks with the first-order method such as stochastic gradient descent will obtain a series of iterates converging to different local optima owing to their different gradients across training. To obviate this, we propose a quotient manifold \emph{PSI manifold}, in which all the equivalent weights of the network with BN are regarded as the same element. Next, we construct gradient descent and stochastic gradient descent on the proposed PSI manifold to train the network with BN. The two algorithms guarantee that every group of equivalent weights (caused by positively re-scaling)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Face and Expression Recognition
