Is Second-order Information Helpful for Large-scale Visual Recognition?
Peihua Li, Jiangtao Xie, Qilong Wang, Wangmeng Zuo

TL;DR
This paper introduces a covariance pooling method called MPN-COV for convolutional networks, leveraging second-order feature statistics to improve large-scale visual recognition performance.
Contribution
It proposes a novel end-to-end trainable covariance pooling technique that captures second-order information and addresses covariance estimation challenges.
Findings
Achieved over 4% accuracy gain on AlexNet with MPN-COV.
Improved VGG-16 performance by approximately 2.5%.
Outperformed ResNet-101 and matched ResNet-152 on ImageNet.
Abstract
By stacking layers of convolution and nonlinearity, convolutional networks (ConvNets) effectively learn from low-level to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate complex boundaries of thousands of classes, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-the-art works concentrate only on deeper or wider architecture design, while rarely exploring feature statistics higher than first-order. We take a step towards addressing this problem. Our method consists in covariance pooling, instead of the most commonly used first-order pooling, of high-level convolutional features. The main challenges involved are robust covariance estimation given a small sample of large-dimensional features and usage of the manifold structure of covariance matrices.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsMatrix-power Normalization · Average Pooling · Local Response Normalization · Grouped Convolution · Dropout · Dense Connections · Softmax · How do I speak to a person at Expedia?-/+/ · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution
