The Power of Linear Combinations: Learning with Random Convolutions
Paul Gavrikov, Janis Keuper

TL;DR
This paper shows that many CNNs can perform well without learned filters by using simple linear combinations of random filters, which can improve robustness and reduce overfitting, especially with larger kernels.
Contribution
It demonstrates that random convolution filters combined linearly can replace learned filters in CNNs, challenging traditional training paradigms and highlighting the importance of filter combinations.
Findings
Random filter combinations achieve high accuracy without training
Linear combinations of random filters can regularize and improve robustness
Kernel size increases the benefit of learned filter combinations
Abstract
Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random…
Peer Reviews
Decision·Submitted to ICLR 2024
The presentation of the paper is very good. All the findings are provided in a digestible and organized way, making the paper an interesting, intriguing and enjoyable read. The study is carried out in a systematic way, shedding light into the modus operandi of CNNs. Altogether I believe this is a very strong submission with a little flaw in its evaluation.
The only weakness I see is that the experiments in Section 5.2 are not conclusive. I would encourage the authors to try and improve this part as it weakens the analysis of the paper –which is very strong up to this point.
* The paper is well-written overall, with a good background on both the recent developments in convolutional kernel sizes, and usage of 1x1 convolutions. * The experimental setup is overall quite good overall (with some exceptions listed below), with the appropriate models and datasets used to demonstrate convincing empirical evidence on the author's hypothesis. * The robustness results in the paper are perhaps the most novel/insightful part of the paper, and would benefit from more analysis/fur
* The hypothesis and results are not surprising at all given how many linear combinations the authors learn in replacement of learned spatial filters. We already know that we can learn to reconstruct anything in a space by learning a linear combination of orthogonal basis vectors. While the basis vectors in this case are not designed to be orthogonal, they are sampled randomly sampled from a high-dimensional space (e.g. the kernel space of e.g. 3x3xC, where C is often very large). Randomly sampl
Authors are very thorough in positioning their work and contrasting it with previous methods. I appreciate the thorough literature review, as it contributes to a clear context for the current work. The main research question that the authors investigate; the possibility of training a CNN without modifying the original spatial convolution weights, is well addressed theoretically and empirically, at least in the setting of image classification. The authors perform a number of relevant ablations ov
- My main objection with respect to this manuscript is the clarity of practical impact of this approach. It seems like in most settings, the proposed approach slightly underperforms traditional CNN architectures. For larger expansion rates, the model may outperform the baseline, but to me the trade-off in computational complexity is unclear. Although the author’s findings are interesting in their own right - it is certainly somewhat suprising that a factorization this drastic is still able to pe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
MethodsTest · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · Residual Block · Kaiming Initialization · Bitcoin Customer Service Number +1-833-534-1729 · Pointwise Convolution · Convolution
