On Learning-Curve Monotonicity for Maximum Likelihood Estimators

Mark Sellke; Steven Yin

arXiv:2512.10220·math.ST·December 29, 2025

On Learning-Curve Monotonicity for Maximum Likelihood Estimators

Mark Sellke, Steven Yin

PDF

Open Access

TL;DR

This paper establishes the first nontrivial guarantees of learning-curve monotonicity for maximum likelihood estimators in specific parametric models, demonstrating improved average performance with more data.

Contribution

It provides the first monotonicity guarantees for MLE in well-specified parametric settings, including Gaussian and Gamma models, using GPT-5.2 Pro for proof development.

Findings

01

Monotonicity of forward KL divergence for Gaussian vectors with unknown covariance.

02

Complete monotonicity of KL divergence for certain models.

03

Monotonicity results for reverse KL divergence in exponential families.

Abstract

The property of learning-curve monotonicity, highlighted in a recent series of work by Loog, Mey and Viering, describes algorithms which only improve in average performance given more data, for any underlying data distribution within a given family. We establish the first nontrivial monotonicity guarantees for the maximum likelihood estimator in a variety of well-specified parametric settings. For sequential prediction with log loss, we show monotonicity (in fact complete monotonicity) of the forward KL divergence for Gaussian vectors with unknown covariance and either known or unknown mean, as well as for Gamma variables with unknown scale parameter. The Gaussian setting was explicitly highlighted as open in the aforementioned works, even in dimension 1. Finally we observe that for reverse KL divergence, a folklore trick yields monotonicity for very general exponential families. All…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning