A Unified Theory of Quantum Neural Network Loss Landscapes
Eric R. Anschuetz

TL;DR
This paper introduces a unified theoretical framework for quantum neural network loss landscapes, modeling them as Wishart processes to analyze trainability, gradient behavior, and minima distribution, advancing understanding beyond classical neural networks.
Contribution
It proves that QNNs generally form Wishart processes, providing conditions for Gaussian process limits and calculating gradient and minima distributions, which is a novel theoretical insight.
Findings
QNNs form Wishart processes instead of Gaussian processes.
Derived conditions for Gaussian process limits in QNNs.
Calculated gradient distributions and minima distributions for QNNs.
Abstract
Classical neural networks with random initialization famously behave as Gaussian processes in the limit of many neurons, which allows one to completely characterize their training and generalization behavior. No such general understanding exists for quantum neural networks (QNNs), which -- outside of certain special cases -- are known to not behave as Gaussian processes when randomly initialized. We here prove that QNNs and their first two derivatives instead generally form what we call "Wishart processes," where certain algebraic properties of the network determine the hyperparameters of the process. This Wishart process description allows us to, for the first time: give necessary and sufficient conditions for a QNN architecture to have a Gaussian process limit; calculate the full gradient distribution, generalizing previously known barren plateau results; and calculate the local…
Peer Reviews
Decision·ICLR 2025 Poster
- The article is very well structured and seems technically of excellent quality. (However, I did not check or verify the proofs in the Appendix in detail) - The overall question on the structure of quantum neural networks and their potential advantages is relevant and the new findings advance the field. The new Wishart process framework introduced in this article seems useful, as demonstrated by the results derived mentioned in the summary.
- The results are not groundbreaking new but generalizations of previous results and formulated in a more general language and setting. - The results are a bit underwhelming, essentially "it seems unlikely that there exists any computational quantum advantage during the training of QNNs", which already seemed unlikely before. But this is expected, and this evidence is also valuable, so this is not a strong weakness. - The main text is hard to understand without the long Appendix. However, this s
The paper is written very well and tackles a valid problem. The theoretical analysis appears to be well performed and the authors have done a lot to highlight the connection to prior results.
The main weakness I see is in the discussion around limits. It isn't always clear in what limit the authors are discussing the behaviour of the networks. It is also reasonable that a classical neural network's neural tangent kernel matrix has eigenvalues behaving a Wishart distribution and evolving during training. Further, a covariance matrix used to describe a GPR will likely also look very Wishart. While the math and results were presented in the paper, it was never completely clear to me how
1) Several new ideas are introduced from theoretical physics / quantum mechanics and connected with neural networks 2) The claimed result is a major generalization of major theorems in "classical" neural networks such as the infinite-width limit of neural networks (Gaussian processes) or Neural Tangent Kernel theory. While several results are derived from this for quantum NNs, there is potential that these ideas can also lead to new results for classical NNs, e.g. via the correspondence princip
1) Technical: The paper lacks basic definitions and introduction of notation, or precision in doing so, that are crucial to follow the train of thought. This is detrimental as several parts of the notation are non-standard, or at least non-standard within the subsections and audiences of this publication venue. For example, in Theorem 1, a notation is introduced as "....denotes the projection into a (Jordan sub-algebra) A_α ....", first mention in the introduction around eq 2. Nowhere in the mai
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing · Quantum Computing Algorithms and Architecture
MethodsGaussian Process
