Semantic Identity Compression: Zero-Error Laws, Rate-Distortion, and Neurosymbolic Necessity

Tristan Simas

arXiv:2601.14252·cs.IT·May 4, 2026

Semantic Identity Compression: Zero-Error Laws, Rate-Distortion, and Neurosymbolic Necessity

Tristan Simas

PDF

TL;DR

This paper establishes fundamental limits and laws governing semantic identity compression in neural and symbolic systems, emphasizing the importance of collision-fiber geometry for exact identity recovery.

Contribution

It introduces finite laws and rate-distortion principles based on collision-fiber geometry, linking structural ambiguity to symbolic identity mechanisms.

Findings

01

Derived fixed-length converse law for identity recovery

02

Established exact fiberwise rate-distortion law for finite sources

03

Connected collision-fiber geometry to query complexity and system structure

Abstract

Symbolic systems operate over precise identities: variables denote specific objects, pointers target precise memory locations, and database keys refer to singular records. Neural embeddings generalize by compressing away semantic detail, but this compression creates collision ambiguity: multiple distinct entities can share the same representation value. Exact identity recovery requires additional information precisely when representation fibers have size greater than one. The residual cost is controlled by a single combinatorial object: the collision-fiber geometry of the representation map $π$ . Let $A_{π} = max_{u} ∣ π^{- 1} (u) ∣$ be the largest collision fiber. The finite laws include a tight fixed-length converse $L \geq lo g_{2} A_{π}$ , an exact finite-block scaling law, a pointwise adaptive budget $⌈ lo g_{2} ∣ π^{- 1} (u) ∣ ⌉$ , and an exact fiberwise rate-distortion law for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.