Pick-or-Mix: Dynamic Channel Sampling for ConvNets
Ashish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera

TL;DR
The paper introduces PiX, a dynamic channel sampling module for ConvNets that improves efficiency and representation without special implementation, outperforming existing methods in speed and accuracy.
Contribution
PiX is a novel multi-purpose module for dynamic channel sampling that replaces 1x1 convolutions and enhances ConvNet performance and efficiency.
Findings
Replacing 1x1 convolutions with PiX speeds up ResNet by 25% without accuracy loss.
PiX enables ConvNets to learn better data representations than existing attention mechanisms.
PiX achieves state-of-the-art results in network downscaling and dynamic channel pruning.
Abstract
Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for dynamic channel sampling, namely Pick-or-Mix (PiX), which does not require special implementation. PiX divides a set of channels into subsets and then picks from them, where the picking decision is dynamically made per each pixel based on the input activations. We plug PiX into prominent ConvNet architectures and verify its multi-purpose utilities. After replacing 1x1 channel squeezing layers in ResNet with PiX, the network becomes 25% faster without losing accuracy. We show that PiX allows…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
1. The problem addressed in this paper is important, and the compression of convolutional neural networks is indeed a worthwhile research topic. The rationale behind this paper is also quite reasonable. 2. The writing in this article is clear, making it easy to understand the introduction of the methods and the description of the experiments.
1. The experiments are not comprehensive enough as the author only conducted experiments on the ResNet and VGG series network structures. However, there have been many recent advancements in network structures, such as the EfficientNet series or the ViT series. Conducting experiments on a wider range of network structures can enhance the impact of this paper. 2. The comparisons are not sufficient, as many state-of-the-art pruning methods have not been compared. Moreover, compared to other solut
The paper is very well written and easy to follow. The authors have done a very good job in analyzing the computational cost, memory footprint, and run-time of their proposed solution. The proposed method has been applied to a large number of Convnet architectures and the author's report results on EfficientViT in the appendix as well. The evaluation on 4 datasets is thorough and sufficient.
I have several concerns about the evaluation and novelty of the proposed method. - The major component of the proposed method, namely Depth-wise pooling operation has already been proposed in [1] and [2]. The main differentiating factor seems to be the pick operator which learns to dynamically select between Average- and Max-pooling. However, the ablation study presented in the appendix (Table A5) shows that there is no significant increase in performance for having both operators and choosing
1. The paper writes clearly and is easy to follow. 2. The proposed PiX module is flexible. It can be used to downscaling network and dynamic channel pruning.
1. There is no new basic operations in the proposed module. SENet by Hu et al. uses global pooling to get the dynamic weights for each channel. CBAM by Woo et al. generates the channel weights considering both max pooling and avg pooling. It seems that the proposed module is a combination of SENet, CBAM, and group convolution. Morevoer, the improvements compared with SKNet and RepVGG in table 6 are limited. 2. The proposed module also has relations with group convolution.
+: Experimental results show that the proposed PiX module brings performance improvement for backbone models in terms of both accuracy and computational efficiency, while the proposed PiX module could be well generalized to various tasks. +: The proposed method seems simple and easy to implement.
-: I have a bit doubt on soundness of the proposed method. Specifically, why the features in the same group can use the same max/average operator? In other words, could the channel sampling probability $p$ for $i$-th element of $z = gca(X)$ represent all channels in $i$-th group? Should all channels in $i$-th group use the same max/average operator? Additionally, I am confused about how different pixels in the same channel adopt different operators. -: The experiments show the proposed PiX modu
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Sparse Evolutionary Training · Dilated Convolution · guidence~How to file a complaint against Expedia? · Average Pooling · Sigmoid Activation · How do i ask a question at Expedia?*AskExpertService · Selective Kernel Convolution · Batch Normalization · Communication--Guide||How Do I Communicate to Expedia?
