Sparse Covariance Neural Networks
Andrea Cavallo, Zhan Gao, Elvin Isufi

TL;DR
This paper introduces Sparse Covariance Neural Networks (S-VNNs), which improve graph convolutional models by sparsifying covariance matrices, leading to better stability, efficiency, and performance across various applications.
Contribution
The paper proposes a novel S-VNN framework that applies sparsification techniques to covariance matrices, enhancing stability and efficiency in graph neural networks.
Findings
S-VNNs outperform traditional VNNs in stability and accuracy.
Sparsification reduces computational cost significantly.
Experimental results confirm improved performance across multiple datasets.
Abstract
Covariance Neural Networks (VNNs) perform graph convolutions on the covariance matrix of input data to leverage correlation information as pairwise connections. They have achieved success in a multitude of applications such as neuroscience, financial forecasting, and sensor networks. However, the empirical covariance matrix on which VNNs operate typically contains spurious correlations, creating a mismatch with the actual covariance matrix that degrades VNNs' performance and computational efficiency. To tackle this issue, we put forth Sparse coVariance Neural Networks (S-VNNs), a framework that applies sparsification techniques on the sample covariance matrix and incorporates the latter into the VNN architecture. We investigate the S-VNN when the underlying data covariance matrix is both sparse and dense. When the true covariance matrix is sparse, we propose hard and soft thresholding…
Peer Reviews
Decision·Submitted to ICLR 2025
- The idea of sparsifying the covariance before feeding it to the network is sound.
- This paper is very hard to follow. Several variables are not defined ($V$, $u$, $u_g$, $h_{klfg}$, etc...). Several concepts must be guessed from the text (what are the covariance filters? how does $u$ relate to the $u_g$ in eq. 1? what are the per covariance filters in theorem 1?). Overall, it is hard to understand this paper alone without reading the original paper on covariance neural networks. This paper needs a major rewriting before being ready for publication, which explains my note. -
**Strengths:** - The proposed architectures come with apparent theoretical guarantees on the closeness of hidden representations or predictions when applied to the approximate versus true covariance matrices. These distances become smaller as the number of samples increase, at a rate of $\mathcal{O}(t^{-1/2})$, ignoring some parameter-dpendent constants. - Experiments are provided showing the predictive performance and stability of the proposed methods. This experiments agree with the theory, an
**Weaknesses:** - Comparing Lemma 1 and Thoerem 1. I am guessing (please correct me if I am wrong!) that somehow hidden inside $\mathcal{O}$ in Theorem 1 are the parameters $\mathcal{H}$. Intuitively, these should learn something similar to PCA, and if the eigenvalues of PCA are close, this constant term in $\mathcal{O}$ will be bad but in terms of $\mathcal{H}$. In Lemma 1, the constant term in terms of the small gap eigenvalues is explicitly given, and it is obvious how this causes instability
The presentation of the paper is clear.
1. It is well known that the covariance matrix is not invariant to the scale of the data, making it impractical to set a common threshold for different elements of c_{ij}. Thus, the rationale behind Definition 2 is difficult to justify. 2. From a sparsity perspective, the elements c_{ij}'s should be classified into two categories: zero and nonzero. However, this key point is obscured in Theorem 2, making the results difficult to interpret. In other words, it is challenging to justify that the p
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
