Information Geometry and Asymptotic Theory for SMML Estimators
Enes Makalic, Daniel F. Schmidt

TL;DR
This paper explores the geometric and asymptotic properties of SMML estimators, revealing their connection to Fisher-Rao geometry, Voronoi tessellations, and divergence-based centroids in statistical models.
Contribution
It provides a novel geometric interpretation of SMML estimators using information geometry and asymptotic analysis, linking coding principles to divergence geometry.
Findings
SMML decomposes into assertion entropy and conditional cross-entropy.
Optimal SMML partitions asymptotically relate to Fisher-Rao Voronoi tessellations.
For exponential families, SMML codepoints satisfy a moment-matching condition.
Abstract
Strict minimum message length (SMML) is an information-theoretic coding principle that represents a continuous statistical model by a finite set of assertions and a partition of the sample space. We show that the SMML objective decomposes into assertion entropy and conditional cross-entropy, balancing the cost of identifying an assertion against the cost of encoding data under the assigned model. For any fixed partition, the optimal codepoint for each cell is the model distribution that minimises Kullback-Leibler divergence from the data distribution restricted to that cell. Using the local Fisher-Rao geometry of regular parametric models, we show that, under high-resolution regularity conditions, optimal SMML partitions are asymptotically the pullback, through the maximum likelihood estimator, of weighted Fisher-Rao Voronoi tessellations in parameter space, with assertion probabilities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
