On the Exploration of Convolutional Fusion Networks for Visual Recognition
Yu Liu, Yanming Guo, and Michael S. Lew

TL;DR
This paper introduces convolutional fusion networks (CFN), an efficient method for multi-scale deep representation fusion that improves performance on various visual recognition tasks with fewer parameters.
Contribution
The paper proposes CFN, a novel fusion approach using 1x1 convolutions and a locally-connected module for adaptive, discriminative feature fusion, enhancing multi-scale deep representations.
Findings
CFN achieves significant accuracy improvements on CIFAR and ImageNet datasets.
CFN generalizes well to scene, fine-grained recognition, and image retrieval tasks.
CFN requires fewer parameters than traditional multi-scale fusion methods.
Abstract
Despite recent advances in multi-scale deep representations, their limitations are attributed to expensive parameters and weak fusion modules. Hence, we propose an efficient approach to fuse multi-scale deep representations, called convolutional fusion networks (CFN). Owing to using 11 convolution and global average pooling, CFN can efficiently generate the side branches while adding few parameters. In addition, we present a locally-connected fusion module, which can learn adaptive weights for the side branches and form a discriminatively fused feature. CFN models trained on the CIFAR and ImageNet datasets demonstrate remarkable improvements over the plain CNNs. Furthermore, we generalize CFN to three new tasks, including scene recognition, fine-grained recognition and image retrieval. Our experiments show that it can obtain consistent improvements towards the transferring tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsConvolution
