Designing Concise ConvNets with Columnar Stages
Ashish Kumar, Jaesik Park

TL;DR
This paper introduces CoSNet, a simple, resource-efficient convolutional neural network with a novel columnar stage design, achieving competitive performance with fewer parameters and FLOPs, suitable for deployment in resource-limited environments.
Contribution
The paper proposes CoSNet, a new macro design for ConvNets featuring parallel convolutions and columnar stacking, emphasizing simplicity and efficiency over complex architectures.
Findings
CoSNet achieves competitive accuracy with fewer parameters.
It outperforms many ConvNets and Transformers in resource-constrained scenarios.
The design reduces FLOPs and model size significantly.
Abstract
In the era of vision Transformers, the recent success of VanillaNet shows the huge potential of simple and concise convolutional neural networks (ConvNets). Where such models mainly focus on runtime, it is also crucial to simultaneously focus on other aspects, e.g., FLOPs, parameters, etc, to strengthen their utility further. To this end, we introduce a refreshing ConvNet macro design called Columnar Stage Network (CoSNet). CoSNet has a systematically developed simple and concise structure, smaller depth, low parameter count, low FLOPs, and attention-less operations, well suited for resource-constrained deployment. The key novelty of CoSNet is deploying parallel convolutions with fewer kernels fed by input replication, using columnar stacking of these convolutions, and minimizing the use of 1x1 convolution layers. Our comprehensive evaluations show that CoSNet rivals many renowned…
Peer Reviews
Decision·ICLR 2025 Poster
This paper revisits some of the fundamental design ideas in conv net and proposed some interesting ideas. 1. Shallow-deep projection is quite interesting. This inherits ideas from ResNet and expands to deep connection. 2. It achieves competitive performances (accuracy and latency) with reduced network depth and parameters counts. 3. It also introduces a pairwise frequent fusion (PFF) to fuse information across different columns.
Please refer to the questions section, where some clarity or more experiments would be great.
1. The writing is easy to read and clearly explains everything in the paper. 2. The experimental result is good compared to the previous works. Empirically, the method seems to offer strong accuracy, compared to existing methods with similar architectures.
1. Some details are missing. For example, how is the value of parallel convolution M determined? I think that different values of M will affect the performance. Please explain this details in the text. Other minor issues, such as Section 3.4 is missing in Figure 2 (c), and you should add it. 2. How is the design like "input replication" to improving performance for example? Authors need to give some details in the manuscript. 3. The related work is comprehensive. However, the authors only highl
1. The motivation is reasonable. 2. The result is comparable with the state-of-the-art. 3. The paper is easy to understand.
1. Typo: VanillaNet is published in NeurIPS 2023, instead of 2024. 2. Lack of some new comparison methods, all models were published in 2023 and even earlier. The author should provide more comparisons like InceptionNeXt[1] and UniRepLKNet [2]. 3. The Top-1 accuracy of EfficientNet-B0 is 76.3 [3] or 77.1 [4], but the author gives a much poorer result of 75.1. Similar problems also happen on ConvNeXt-T (82.1 in [5] but 81.8 in this paper) and EfficientViT-M5 (77.1 in [6] but 76.8 in this
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics
MethodsAttention Is All You Need · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings
