Understanding the Covariance Structure of Convolutional Filters
Asher Trockman, Devin Willmott, J. Zico Kolter

TL;DR
This paper reveals that convolutional filters in modern networks have highly-structured covariances, and introduces a simple, learning-free initialization method based on these covariances that improves performance and can sometimes eliminate training.
Contribution
It uncovers the structured covariance patterns of convolutional filters and proposes a novel, effective, learning-free initialization scheme based on these covariances.
Findings
Structured covariances are consistent across different models.
Covariance-based initialization improves performance over traditional methods.
In some cases, no training of depthwise filters is needed.
Abstract
Neural network weights are typically initialized at random from univariate distributions, controlling just the variance of individual weights even in highly-structured operations like convolutions. Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions whose learned filters have notable structure; this presents an opportunity to study their empirical covariances. In this work, we first observe that such learned filters have highly-structured covariance matrices, and moreover, we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks of different depths, widths, patch sizes, and kernel sizes, indicating a degree of model-independence to the covariance structure. Motivated by these findings, we then propose a learning-free multivariate initialization scheme for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks
MethodsConvNeXt
