Pyramidal Convolution: Rethinking Convolutional Neural Networks for   Visual Recognition

Ionut Cosmin Duta; Li Liu; Fan Zhu; Ling Shao

arXiv:2006.11538·cs.CV·June 23, 2020·138 cites

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao

PDF

Open Access 3 Repos

TL;DR

This paper introduces pyramidal convolution (PyConv), a multi-scale, efficient convolutional approach that enhances visual recognition tasks without increasing computational costs, demonstrating significant improvements across various computer vision benchmarks.

Contribution

PyConv is a novel multi-scale convolutional method that captures different scene details efficiently, enabling improved performance across multiple visual recognition tasks.

Findings

01

Outperforms ResNet-152 on ImageNet with fewer parameters and lower complexity.

02

Sets new state-of-the-art on ADE20K scene parsing benchmark.

03

Achieves significant improvements in image classification, detection, and segmentation tasks.

Abstract

This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis

Methods1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Average Pooling · Max Pooling · Global Average Pooling · Residual Connection · Kaiming Initialization · Convolution