KernelWarehouse: Rethinking the Design of Dynamic Convolution
Chao Li, Anbang Yao

TL;DR
KernelWarehouse introduces a generalized dynamic convolution method that exploits parameter dependencies within and across layers, enabling larger n values for improved performance and parameter efficiency across various vision models.
Contribution
It redefines dynamic convolution concepts to allow larger n values, enhancing performance and efficiency, and demonstrates effectiveness on multiple datasets and architectures.
Findings
Achieves significant accuracy gains on ImageNet and MS-COCO datasets.
Reduces model size while improving accuracy on ResNet18.
Applicable to both CNNs and Vision Transformers.
Abstract
Dynamic convolution learns a linear mixture of n static kernels weighted with their input-dependent attentions, demonstrating superior performance than normal convolution. However, it increases the number of convolutional parameters by n times, and thus is not parameter efficient. This leads to no research progress that can allow researchers to explore the setting n>100 (an order of magnitude larger than the typical setting n<10) for pushing forward the performance boundary of dynamic convolution while enjoying parameter efficiency. To fill this gap, in this paper, we propose KernelWarehouse, a more general form of dynamic convolution, which redefines the basic concepts of ``kernels", ``assembling kernels" and ``attention function" through the lens of exploiting convolutional parameter dependencies within the same layer and across neighboring layers of a ConvNet. We testify the…
Peer Reviews
Decision·ICML 2024 Poster
The paper is clearly motivated and it is easy to understand the differentiation with respect to prior work. The experimental results are comprehensive, covering several architectures, classification, detection and segmentation, has good ablations and even runtimes. I appreciate for example the inclusion of convnext-tiny and runtime measurements. The paper is not trivial from a technical standpoint. There seems to be a significant amount of effort and experimentation involved into making the id
My main issues are: 1) Architectures have evolved a lot from ResNet18/ResNet50, both in the research as well as the industry areas. 2) Latency is heavily affected. Specifically: Experiments with ResNet or even MobileNet feel a bit out of sync with the current literature in terms of architecture design. I appreciate the inclusion of convnext-tiny and, while for imagenet results show only moderate gains, there are some clear gains for object detection and segmentation. Besides convolutional archi
- The paper investigates dynamic convolution in detail with thorough experiments including the comparison with other sota methods in different downstream tasks. - The ablation of key parameters of the proposed method is given in detail. This paper is written well with organized tables and figures.
- The improvement of KernelWarehouse is limited as shown in Table1 and Table4 considering two models with the same parameter (+ ODConv (4×) vs + KW (4×)), which can not show the advantage of KernelWarehouse in terms of efficiency and performance. - The convolutional parameter budget $b$ is an important parameter, how to choose an appropriate parameter of the downstream task. The effect of $b$ on image classification, object detection and instance segmentation is different from the experiment.
Altogether I believe this is a very strong submission with some flaws in its presentation. It is a very pleasant read with interesting insights, ablations, visualizations and evaluations. This paper has the potential to have a big impact.
To my best assessment, this paper does not have any big weaknesses. However, there are a few things regarding the presentation that would improve reading and the clarity of the proposed method. * The proposed method is simple, yet, in its current form, the paper presents it in a somewhat convoluted manner. I would encourage the authors to restructure the method section (Sec. 3) such that it is presented in an easier way. I want to emphasize that it is not that the paper is not understandable. I
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
MethodsConvolution
