Improved Bilinear Pooling with CNNs
Tsung-Yu Lin, Subhransu Maji

TL;DR
This paper enhances bilinear CNN pooling by applying matrix square-root normalization, leading to improved accuracy in fine-grained recognition tasks and proposing efficient gradient computation methods for training.
Contribution
It demonstrates that matrix square-root normalization significantly boosts bilinear pooling performance and introduces faster, accurate gradient computation techniques for network training.
Findings
Matrix square-root normalization improves recognition accuracy by 2-3%.
Approximate Newton iterations for matrix square-root are faster and equally effective.
Numerical inaccuracies in SVD gradients have negligible impact on final accuracy.
Abstract
Bilinear pooling of Convolutional Neural Network (CNN) features [22, 23], and their compact variants [10], have been shown to be effective at fine-grained recognition, scene categorization, texture recognition, and visual question-answering tasks among others. The resulting representation captures second-order statistics of convolutional features in a translationally invariant manner. In this paper we investigate various ways of normalizing these statistics to improve their representation power. In particular we find that the matrix square-root normalization offers significant improvements and outperforms alternative schemes such as the matrix logarithm normalization when combined with elementwise square-root and l2 normalization. This improves the accuracy by 2-3% on a range of fine-grained recognition datasets leading to a new state of the art. We also investigate how the accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
