What Do Learned Models Measure?

Indr\.e \v{Z}liobait\.e

arXiv:2601.18278·cs.LG·January 27, 2026

What Do Learned Models Measure?

Indr\.e \v{Z}liobait\.e

PDF

Open Access

TL;DR

This paper emphasizes the importance of measurement stability in learned models, showing that standard evaluation metrics do not ensure consistent measurement functions across different contexts, which is crucial for scientific applications.

Contribution

The paper formalizes measurement stability as a new evaluation criterion for learned measurement functions and demonstrates its importance through theoretical analysis and a real-world case study.

Findings

01

Standard evaluation metrics do not guarantee measurement stability.

02

Models with similar predictive accuracy can produce different measurement functions.

03

Distribution shifts can cause significant measurement inconsistencies.

Abstract

In many scientific and data-driven applications, machine learning models are increasingly used as measurement instruments, rather than merely as predictors of predefined labels. When the measurement function is learned from data, the mapping from observations to quantities is determined implicitly by the training distribution and inductive biases, allowing multiple inequivalent mappings to satisfy standard predictive evaluation criteria. We formalize learned measurement functions as a distinct focus of evaluation and introduce measurement stability, a property capturing invariance of the measured quantity across admissible realizations of the learning process and across contexts. We show that standard evaluation criteria in machine learning, including generalization error, calibration, and robustness, do not guarantee measurement stability. Through a real-world case study, we show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference