Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
Yue Song, Nicu Sebe, Wei Wang

TL;DR
This paper investigates why approximate matrix square roots via Newton-Schulz outperform accurate SVD in global covariance pooling for CNNs, proposing a hybrid training protocol and a new GCP meta-layer with improved performance.
Contribution
It provides an empirical analysis of the performance gap between approximate and accurate matrix square roots in GCP, and introduces a novel GCP meta-layer with Pade Approximants for better gradients.
Findings
Approximate matrix square root via Newton-Schulz outperforms SVD in GCP.
Hybrid training protocol improves SVD-based GCP performance.
Proposed GCP meta-layer achieves state-of-the-art results.
Abstract
Global covariance pooling (GCP) aims at exploiting the second-order statistics of the convolutional feature. Its effectiveness has been demonstrated in boosting the classification performance of Convolutional Neural Networks (CNNs). Singular Value Decomposition (SVD) is used in GCP to compute the matrix square root. However, the approximate matrix square root calculated using Newton-Schulz iteration \cite{li2018towards} outperforms the accurate one computed via SVD \cite{li2017second}. We empirically analyze the reason behind the performance gap from the perspectives of data precision and gradient smoothness. Various remedies for computing smooth SVD gradients are investigated. Based on our observation and analyses, a hybrid training protocol is proposed for SVD-based GCP meta-layers such that competitive performances can be achieved against Newton-Schulz iteration. Moreover, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Machine Learning and ELM
