CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks
Qihang Yu, Yingwei Li, Jieru Mei, Yuyin Zhou, Alan L. Yuille

TL;DR
CAKES introduces a channel-wise kernel shrinking method for 3D CNNs, enabling more efficient, flexible, and diverse operations that reduce computational costs while maintaining high performance in 3D scene understanding tasks.
Contribution
This paper presents a novel channel-wise kernel shrinking approach and an automatic search space for efficient 3D CNNs, improving flexibility and reducing complexity.
Findings
Outperforms similar-sized models in 3D medical imaging segmentation.
Achieves comparable performance to state-of-the-art with fewer parameters.
Reduces computational costs significantly in video action recognition.
Abstract
3D Convolution Neural Networks (CNNs) have been widely applied to 3D scene understanding, such as video analysis and volumetric image recognition. However, 3D networks can easily lead to over-parameterization which incurs expensive computation cost. In this paper, we propose Channel-wise Automatic KErnel Shrinking (CAKES), to enable efficient 3D learning by shrinking standard 3D convolutions into a set of economic operations e.g., 1D, 2D convolutions. Unlike previous methods, CAKES performs channel-wise kernel shrinkage, which enjoys the following benefits: 1) enabling operations deployed in every layer to be heterogeneous, so that they can extract diverse and complementary information to benefit the learning process; and 2) allowing for an efficient and flexible replacement design, which can be generalized to both spatial-temporal and volumetric data. Further, we propose a new search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · 3D Shape Modeling and Analysis
MethodsSigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory · Convolution
