Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition
Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao

TL;DR
This paper introduces pyramidal convolution (PyConv), a multi-scale, efficient convolutional approach that enhances visual recognition tasks without increasing computational costs, demonstrating significant improvements across various computer vision benchmarks.
Contribution
PyConv is a novel multi-scale convolutional method that captures different scene details efficiently, enabling improved performance across multiple visual recognition tasks.
Findings
Outperforms ResNet-152 on ImageNet with fewer parameters and lower complexity.
Sets new state-of-the-art on ADE20K scene parsing benchmark.
Achieves significant improvements in image classification, detection, and segmentation tasks.
Abstract
This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
Methods1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Average Pooling · Max Pooling · Global Average Pooling · Residual Connection · Kaiming Initialization · Convolution
