Likelihood-Preserving Embeddings for Statistical Inference
Deniz Akdemir

TL;DR
This paper introduces a theoretical framework for likelihood-preserving embeddings that maintain the integrity of likelihood-based inference when data is compressed into learned representations, with practical neural network methods and validation on distributions and clinical data.
Contribution
It develops the Likelihood-Ratio Distortion metric and the Hinge Theorem, providing conditions under which embeddings preserve statistical inference, and offers a neural network-based constructive approach.
Findings
Controlling the distortion $ ext{Δ}_n$ preserves likelihood-based tests.
Neural network embeddings can approximate sufficient statistics with provable guarantees.
Experiments confirm the phase transition and practical effectiveness of the method.
Abstract
Modern machine learning embeddings provide powerful compression of high-dimensional data, yet they typically destroy the geometric structure required for classical likelihood-based statistical inference. This paper develops a rigorous theory of likelihood-preserving embeddings: learned representations that can replace raw data in likelihood-based workflows -- hypothesis testing, confidence interval construction, model selection -- without altering inferential conclusions. We introduce the Likelihood-Ratio Distortion metric , which measures the maximum error in log-likelihood ratios induced by an embedding. Our main theoretical contribution is the Hinge Theorem, which establishes that controlling is necessary and sufficient for preserving inference. Specifically, if the distortion satisfies , then (i) all likelihood-ratio based tests and Bayes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
