Likelihood-Preserving Embeddings for Statistical Inference

Deniz Akdemir

arXiv:2512.22638·stat.ML·December 30, 2025

Likelihood-Preserving Embeddings for Statistical Inference

Deniz Akdemir

PDF

Open Access

TL;DR

This paper introduces a theoretical framework for likelihood-preserving embeddings that maintain the integrity of likelihood-based inference when data is compressed into learned representations, with practical neural network methods and validation on distributions and clinical data.

Contribution

It develops the Likelihood-Ratio Distortion metric and the Hinge Theorem, providing conditions under which embeddings preserve statistical inference, and offers a neural network-based constructive approach.

Findings

01

Controlling the distortion $ ext{Δ}_n$ preserves likelihood-based tests.

02

Neural network embeddings can approximate sufficient statistics with provable guarantees.

03

Experiments confirm the phase transition and practical effectiveness of the method.

Abstract

Modern machine learning embeddings provide powerful compression of high-dimensional data, yet they typically destroy the geometric structure required for classical likelihood-based statistical inference. This paper develops a rigorous theory of likelihood-preserving embeddings: learned representations that can replace raw data in likelihood-based workflows -- hypothesis testing, confidence interval construction, model selection -- without altering inferential conclusions. We introduce the Likelihood-Ratio Distortion metric $Δ_{n}$ , which measures the maximum error in log-likelihood ratios induced by an embedding. Our main theoretical contribution is the Hinge Theorem, which establishes that controlling $Δ_{n}$ is necessary and sufficient for preserving inference. Specifically, if the distortion satisfies $Δ_{n} = o_{p} (1)$ , then (i) all likelihood-ratio based tests and Bayes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning