Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds
Ke Sun

TL;DR
This paper investigates the Fisher information metric on neural network parameter spaces, providing deterministic bounds and an efficient unbiased random estimator to facilitate scalable computation of the neuromanifold metric.
Contribution
It introduces a novel low-dimensional core space analysis, extends deterministic bounds to the neuromanifold, and proposes an efficient unbiased estimator using Hutchinson's method.
Findings
Derived deterministic bounds for the Fisher metric tensor.
Developed an unbiased random estimator with bounded standard deviation.
Efficient computation with a single backward pass per batch.
Abstract
The high-dimensional parameter space of deep neural networks -- the neuromanifold -- is endowed with a unique metric tensor defined by the Fisher information. Reliable and scalable computation of this metric tensor is valuable for theorists and practitioners. Focusing on neural classifiers, we return to a low-dimensional space of probability distributions, which we call the core space, and examine the spectrum and envelopes of its Fisher information matrix. We extend our discoveries there to deterministic bounds for the metric tensor on the neuromanifold. We introduce an unbiased random estimator based on Hutchinson's trace method and derive related bounds. It can be evaluated efficiently with a single backward pass per batch, with a standard deviation bounded by the true value up to scaling.
Peer Reviews
Decision·ICLR 2026 Poster
Solid mathematical foundation: All major theorems have complete proofs in the appendix. The spectral analysis of the simplex FIM (Theorem 1) and envelope characterizations (Lemma 2) are rigorously established. Novel computational approach: The Hutchinson estimator with “detach-and-mix” construction provides unbiased FIM estimates with provably bounded diagonal CV ≤ √2, addressing variance issues in Monte Carlo methods. Comprehensive theoretical analysis: The paper provides both deterministic b
Experiments: Only DistilBERT tested on AG News and SST-2. Missing: (i) comparison with K-FAC, diagonal empirical Fisher (Adam), exact Gauss-Newton; (ii) optimization performance metrics; (iii) other architectures (CNNs, ResNets). Just I guess a general expansion in this area would be nice, though you note each formulation may require a from-scratch derivation for each. Please comment on this Unclear practical advantage: No demonstration of how the estimator improves optimization or learning dyn
- The problem of approximating the Fisher metric in neural network settings is timely and very important. The proposed estimators are simple and can be easily computed in practice. - The technical part of the paper appears to be sound and the theoretical claims correct, but I have not followed every proof in detail. The experiments support the theoretical analysis. - The paper is generally well written and accessible.
- Since the paper proposes an estimator for the Fisher metric, I would expect additional empirical demonstrations regarding the estimator’s accuracy and efficiency. I acknowledge that the theoretical results constitute the primary contribution, but I believe that further benchmark experiments would help demonstrate the properties of the estimator and also consider comparisons to related methods. The current experimental section provides some insight, but additional comparisons would strengthen t
- Provides a unified and theoretically grounded framework for deterministic and stochastic FIM estimation with clear variance guarantees. - The Hutchinson estimator is efficient and easy to integrate into deep learning pipelines, enabling scalable information-geometric analysis.
- The paper does not validate the estimator on analytically known probability distributions, especially in high-dimensional settings. This makes it difficult to verify the claimed accuracy of the proposed bounds and stochastic estimates. - The practical impact of improved FIM accuracy on downstream tasks (e.g., optimization, generalization) is not demonstrated. In some real applications [a], approximation may even be preferable to precision, since neural networks themselves are inherently approx
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Advanced Neuroimaging Techniques and Applications · Elasticity and Material Modeling
