A Robust Initialization of Residual Blocks for Effective ResNet Training   without Batch Normalization

Enrico Civitelli; Alessio Sortino; Matteo Lapucci; Francesco Bagattini; and Giulio Galvan

arXiv:2112.12299·cs.LG·November 7, 2023·1 cites

A Robust Initialization of Residual Blocks for Effective ResNet Training without Batch Normalization

Enrico Civitelli, Alessio Sortino, Matteo Lapucci, Francesco Bagattini, and Giulio Galvan

PDF

Open Access

TL;DR

This paper introduces a new weight initialization method for residual networks that eliminates the need for Batch Normalization, enabling effective training of normalization-free ResNet-like architectures with competitive results.

Contribution

It proposes a simple modification to residual block summation for proper initialization, improving training stability without additional regularization or algorithmic changes.

Findings

01

Achieves competitive accuracy on CIFAR-10 and CIFAR-100

02

Performs well on ImageNet without Batch Normalization

03

Simplifies training of normalization-free ResNets

Abstract

Batch Normalization is an essential component of all state-of-the-art neural networks architectures. However, since it introduces many practical issues, much recent research has been devoted to designing normalization-free architectures. In this paper, we show that weights initialization is key to train ResNet-like normalization-free networks. In particular, we propose a slight modification to the summation operation of a block output to the skip-connection branch, so that the whole network is correctly initialized. We show that this modified architecture achieves competitive results on CIFAR-10, CIFAR-100 and ImageNet without further regularization nor algorithmic modifications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning