Physics-Aware, Shannon-Optimal Compression via Arithmetic Coding for Distributional Fidelity
Cristiano Fanelli

TL;DR
This paper introduces a physics-informed, Shannon-optimal compression method using arithmetic coding to measure dataset fidelity in high-dimensional settings, providing an absolute, interpretable metric based on codelength differences.
Contribution
It proposes a novel information-theoretic approach that uses lossless compression as an absolute measure of distributional fidelity grounded in physical correlations.
Findings
Achieves improved compression over gzip.
Provides a physically meaningful fidelity metric in bits.
Establishes arithmetic coding as a tool for absolute fidelity assessment.
Abstract
Assessing whether two datasets are distributionally consistent is central to modern scientific analysis, particularly as generative artificial intelligence produces synthetic data whose fidelity must be validated against real observations in increasingly high-dimensional settings. Existing approaches are typically relative: they determine whether one dataset is more consistent with a reference than another, but do not provide a physically grounded absolute standard for fidelity. We propose an information-theoretic approach in which lossless compression via arithmetic coding provides an operational measure of dataset fidelity under a physics-informed probabilistic representation. Datasets sharing the same underlying physical correlations admit comparable optimal descriptions, while discrepancies-arising from miscalibration, mismodeling, or bias-manifest as an irreducible excess in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques · Error Correcting Code Techniques
