Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Giyeong Oh, Woohyun Cho, Siyeol Kim, Suhwan Choi, Youngjae Yu

TL;DR
This paper proposes Orthogonal Residual Updates, decomposing residuals to encourage modules to learn new features, which enhances training stability and accuracy across various deep learning architectures and datasets.
Contribution
It introduces a novel orthogonal decomposition method for residual updates, improving feature diversity and training efficiency in deep networks.
Findings
Improved generalization accuracy across architectures and datasets.
Enhanced training stability and convergence.
Significant accuracy gains, e.g., +3.78 pp on ViT-B/ImageNet-1k.
Abstract
Residual connections are pivotal for deep neural networks, enabling greater depth by mitigating vanishing gradients. However, in standard residual updates, the module's output is directly added to the input stream. This can lead to updates that predominantly reinforce or modulate the existing stream direction, potentially underutilizing the module's capacity for learning entirely novel features. In this work, we introduce Orthogonal Residual Update: we decompose the module's output relative to the input stream and add only the component orthogonal to this stream. This design aims to guide modules to contribute primarily new representational directions, fostering richer feature learning while promoting more efficient training. We demonstrate that our orthogonal update strategy improves generalization accuracy and training stability across diverse architectures (ResNetV2, Vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
